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Preface 



This volume of the Lecture Notes in Computer Science series contains the pro- 
ceedings of the Second Conference on Numerical Analysis and Applications, 
which was held at the University of Rousse, Bulgaria, June 11-15, 2000. The 
conference was organized by the Department of Numerical Analysis and Statis- 
tics at the University of Rousse with support from the University of Southern 
Mississippi, Hattiesburg. The conference was co-sponsored by SIAM (Society 
for Industrial and Applied Mathematics) and ILAS (International Linear Al- 
gebra Society). The official sponsors of the conference were Fujitsu America, 
Inc., Hewlett-Packard GmbH, and Sun Microsystems. We would like to give our 
sincere thanks to all sponsors and co-sponsors for the timely support. 

The second conference continued the tradition of the first one (1996 in 
Rousse) as a forum, where scientists from leading research groups from the 
“East” and “West” are provided with the opportunity to meet and exchange 
ideas and establish research cooperation. More than 120 scientists from 31 coun- 
tries participated in the conference. 

A wide range of problems concerning recent achievements in numerical anal- 
ysis and its applications in physics, chemistry, engineering, and economics were 
discussed. An extensive exchange of ideas between scientists who develop and 
study numerical methods and researchers who use them for solving real-life prob- 
lems took place during the conference. 

We are indebted to our colleagues who helped us in the organization of this 
conference. We thank the organizers of the mini-symposia for attracting active 
and highly qualified researchers. 
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Sensitivity Analysis of the Expected 
Accumulated Reward Using Uniformization and 

IRKS Methods 



Hai'scam Abdallah and Moulaye Hamza 
IRISA 

Campus de Beaulieu, 35042 Rennes cedex, France 
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Abstract. This paper deals with the sensitivity computation of the ex- 
pected accumulated reward of stiff Markov Models. Generally, we are 
faced with the problem of computation time, especially when the Markov 
process is stiff. We consider the standard uniformization method for 
which we propose a new error bound. Because the time complexity of 
this method becomes large when the stiffness increases, we then sug- 
gest an ordinary differential equations method, the third order implicit 
Runge-Kutta method. After providing a new way of writing the system 
of equations to be solved, we apply this method with a stepsize choice 
different from the classical one in order to accelerate the algorithm exe- 
cution. Finally, we compare the time complexity of both of the methods 
on a numerical example. 



1 Introduction 

As the use of computing systems increases, the requirement of analyzing both 
their performance and reliability have become more important. Reward Markov 
models are common tools for modelling such systems behaviour. Doing so, a 
Continuous-Time Markov Chain (CTMC) is used to represent changes in the 
system’s structure, usually caused by faults and repairs of its components, and 
reward rates are assigned to the states of the model. Each reward represents 
the state performance of the system in a particular configuration. For these 
models, it may be of interest to evaluate not only some instantaneous transient 
measures, but also some cumulative ones such as the Expected Accumulated 
Reward (EAR) over a given interval [0,t], t being the system’s mission time. As 
the input parameters used to define the Markov models (fault rates, repair rates, 
etc.) are most of the time estimated from few experimental observations, the 
transient solutions are subject to uncertainties. Therefore, it becomes necessary 
to introduce parametric sensitivity analysis, the computation of derivatives of 
system measures with respect to input parameters. Generally, we are faced with 
the problem of computation time, especially when the Markov model is stiff, i.e., 
when the failure rates are much smaller than the repair rates. In this paper, we 
focus on the computation of the sensitivity of the EAR of stiff Markov models. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 1—9, 2001. 
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We consider two numerical methods: the Standard Uniformization (SU) method 
and the third order Implicit Runge-Kutta (IRKS) method. 

The SU method consists in expressing the EAR [I] and its sensitivity in the form 
of an infinite sum. The main advantage of this method is that for a given toler- 
ance, the truncation of the previous infinite sum allows to bound the global error. 
We propose to derive a new error bound. Unfortunately, when the models are 
stiff and the mission time is large, the computation time becomes prohibitive. We 
then suggest the L-stable Ordinary Differential Equations (ODE) IRKS method. 
This method has been used to compute the instantaneous state probability vec- 
tor [2] and its sensitivity [S] . In order to compute the sensitivity of the EAR by 
this method, first we provide a new way of writing a non homogeneous ODE 
in a system of the form y' = Ay, where A is a constant. Next, we choose a new 
stepsize to accelerate the execution of the IRKS algorithm. 

The paper is organized as follows: the following section sets the problem. In 
Section S, the SU technique is presented and a new bound is provided. Section 4 
is devoted to the IRKS method and the new stepsize choice. A concrete example 
and a comparison of both of the methods from a time complexity point of view 
are given in Section 5. 



2 Problem Formulation 



Consider a computing system modelled by a CTMC, say X = {Xt, t > 0}, 
defined over a finite state space IE = {I,2,...,M}. Let R= (r^) be the reward 
rate vector; n denotes the reward rate assigned to state i of X. We suppose the 
transition rates depend on a parameter 9 (failure rate, repair rate, etc.). The 
infinitesimal generator (or transition rate matrix) of the CTMC X is denoted by 
Q(0) = {qij{9)). Let be the instantaneous state probability vector. The 

EAR over the interval [0,t] is defined by 



M 

E[Y{0,t)] = ^^riLi{9,t) = L{9,t)R where L{9,t) 

i=l 



n(0, s)ds. 



( 1 ) 



The sensitivity of E[Y{9,t)] is its partial derivative relatively to 6. From (1), we 
get 



§yE[Y{e,t)] = -^^[L{0,t)R] = 



de 



L{0,t) 



R. 



(2) 



given that reward rates are supposed to be constant. 

It is known that the vector II {6, t) is the solution of the Chapman- Kol- 
mogorov first order linear differential equations : 






n{0,t)Q{9); 7T(0, 0) = 77(0) is given. 



( 3 ) 



Then we have 



n{9,t) = n{o)p{ 0 ,t), 



( 4 ) 
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where 



p{e,t) = = J2 q{oY 



n—0 



The computation of the sensitivity of L{9, t) may be done in two ways. The first 
one consists in computing 11(9, t), integrating it over and deriving that 

expression relatively to 9. This is the case for the SU method. The other one 
integrates system (3). A new system of equations, whose solution is L{9,t), is 
obtained. That new system is then derived with respect to 9 and the solution of 
the final system of equations is -^L{9, t). In that case, we use the IRKS method. 



3 The SU Method 

The SU technique [I], [4] transforms Q(0) into the stochastic matrix P{9) = 
/ + Q{9)/q where I is the identity matrix and g is a constant such that q > 
maXi I qu{9) |. It follows that Q{9) = q{P{9) — I) and 



P{9,t) = 

The matrix P{9, t) may then be writte: 

P{9, t) = ^ p{n, qt)P{9)'^ where p{n, qt) = e~‘^* 

n—0 

From relation (4), we get 

OO 

n{9,t) = Y,p{n,qt)n{Q)P(9r. 



n\ 



(5) 



( 6 ) 



(7) 



n=0 



Defining the vector fl^'^\9) by fl^'^\9) = n(0)P{9) , we have recursively: 

n > 1; fl^°\9) = 7T(0). 

The expression of the cumulative distribution L(9, t) is obtained by integrating 
relation (7): 

OO ^ n 

n—0 A:— 0 

The derivation of (8) with respect to 9 gives the sensitivity of L{9,t), denoted 
by SL{9,t), as follows: 



OO ^ n ^ 

= 7T«(0). 



' n + 1 d9 

n—0 k—0 



(9) 



The vectors ^fl^^\9), k>l, are such that 









P{9) + fl^^-^\9) 






-me) = 0, 



( 10 ) 
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In practical implementations, the previous infinite series (9) is truncated at a 
step Ns- Let be the error vector on SL{d,t). We have 

oo ^ n ^ 

FL = t Y. 



n + 1 ^ 89 



n=Ns + l 

The infinite norm of the vector verifies: 

C30 n 

\\FL\\^<t Y 

n— 7 V 5 + I k—0 

From relation (10), it may be established by recurrence that for all k G IN, 



L5<‘>(«) 



jft »'*>(«) 



< k 






. It follows that 



n—JVs~l-l k—0 






1 nfn + 1) 

n=JVs + l 



P<^> 



_5 



P(0) 



Y np{n, qt) 

00 n— ATs + 1 






qt Y Pi'^^P) 

0 ° n=Ns 



Y 

2 



_5 



m 






Y pp^^^o 

00 n—Ns 
00 

Y 

00 fi—Ns 



Taking into account this bound and relation (2), the error on ^F[y(0,f)], de- 
noted by Fy, is such that 



Fy < j IIRII, 



lei"* 



fi—Ns 



(11) 



If Fi, is the error vector on L{9, t) (relation (8)) and Nl is the infinite truncation 
step, we can easily show that: 



\\EL\\^<t 



Nl 

1 - 

n—0 



Thus we can bound the error on E\Y{d,t)], denoted by Fy, as follows: 



Fy E t 



Nl 

1 - 

n—0 



Plloo- 



( 12 ) 
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The infinite truncation error and the time complexity of the SU method for 
computing -^E\Y{9,t)] (and E\Y{0,t)]) depend on the truncation strategy. Re- 
member that, for a given tolerance error e, the infinite sum (7) will be truncated 
after term Nt such that 

Nt 

e>l-'^p{n,qt). (13) 

n— 0 

A first strategy consists in truncating the infinite sums U{9,t), L{9,t), and 
SL{9,t) at the same point, that is to say Nt = Nt- From relations (12) and 
(11), we have: 

EY<te\\R\\^ (14) 



and 



Fy<^\\RI 






[s + p{NT,qt)] . 



(15) 



Another strategy allows the computation of E[Y {9, t)] with an absolute maximal 
error ey- This is equivalent to set 



_ 

""^Plloo' 

Note that, it is also possible to compute ■^E[Y{9,t)] after truncation in N$ 
i.e., by bounding relation (11) by a given tolerance error es- It is clear that 
£ < £y < £s and that the deviations become important when the mission time t 
increases. The values of Nt and Ns may then be much greater than qt and 
the time complexity may considerably raise. To avoid that, we consider the first 
strategy. 

The computation of -^E[Y{9,t)] requires essentially 3 vector-matrix prod- 
ucts per iteration. The time complexity of the SU method is then O (3NtM‘^). 
Using a compact storage of Q{9) and its derivative ■^Q{9), that time complex- 
ity may be reduced to O {NT{2r] + ijs)) where r/ and ijg denote the number of 
non-null elements in Q{9) and -§gQ{9). When the stiffness (thus qt) and the 
state space cardinality M increase, the computation time becomes prohibitive 
because from (13), Nt > qt. The following method we propose deals efficiently 
with that class of problems. 

4 The IRKS Method 

Generally, ODE methods apply to systems of equations of the form 
y'{t) = They consist in dividing the solution interval [0,t] into 

{0 = toi ■■■■,tn = t} and computing an approximated solution of the unknown 
function y(t) at each point U, i > 1. Let y{ti) (resp. t/i) be the exact (resp. 
approximated) solution of the differential equation at ti . The stepsize is defined 
as h-i = ti-^i tj. 
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In order to deal with the stiffness, we consider an ODE L-stable (or stiffly 
stable) method such that IRKS. More details on ODE methods and L-stability 
property may be found in [5]. The integration of system (3) gives 

= L(0,t)Q(0)+i7(O); L(0, 0) = L(0) = 0. (16) 

Deriving this relation with respect to 9, we obtain 

-SL{e,t) = SL{e,m{e) + L{ 9 ,t)—Q{ey, ^^(0,0) = ^^(o) = o. (17) 



Putting altogether (16) and (17), we get the following ODE system of the form 
y' = Ay 

^V{e,t) = V{e,t)B{9) (18) 

where P(0, t) = (5^(0, t) L{9,t) 1) and 

/ Q(0) 0 0\ 

m = §Q(0) Q(0) 0 

V 0 77(0)0/ 



The initial condition is such that P(0,O) = (0 0 1). 

Applied to equation (18), the IRKS method gives l^+i as solution of the 
linear system equations 



Vi+l 






= 



I+-h,B(9) 



(19) 



^4 

At time t + hi, the local error vector is £{hi) = yiK(t)i?(0)^. The Local 

Truncation Error (LTEi) is its norm (the infinite norm for example). At each 
step i, LTEi must be less than a specified tolerance r. The stepsize hi must 
satisfy hmin "E hi < hmax', the bounds hmin and hmax are fixed to avoid too 
many steps (if hi if too small) and a bad precision (if hi is too large). For 
example, in [6], h^in = 10“^ and hmax = 10. Moreover, hi is chosen such that 
the solution at t + hi meet the specified tolerance. A commonly used technique 
is 



hi < 7i_i 



LTE,, 



r+l 



( 20 ) 



where r is the order of the method (r = 3). 

Usually, the LTEi computation is done after solving the linear system (19) 
and must satisfy LTEi < t. It is important to note that when a step is rejected, 
the computation time for solving the system is then useless. When the stepsize 
is small (e.g. ho = 10“^), it is accepted for the following steps increasing the 
execution time. To avoid these drawbacks, we considered the expression of the 
error vector e{hi). We observed that, at any step, it only depends upon some 
known variables of the previous step. At each step, LTEi may be calculated 



Sensitivity Analysis of the Expected Accumulated Reward 



7 



first and the optimal stepsize chosen by using the formula (20). When doing so, 
we automatically have the biggest hi for which the LTEi < r. The stepsize hi 
is rejected if it is bigger than hmax, and instead we take it to be h^ax, or if 
it is less than hmin and it will be set to hmin- With this technique, we got 
stepsizes varying from 0.1 to 23, with an initial stepsize ho = 10“^, a mission 
time t = 10® and state space size M = 100; the average stepsize was 1.5 (see 
following section). Let us note the average stepsize decreases when M increases. 
Thus, setting t = 10® and M = 400, the average stepsize becomes 0.99. 

The IRKS method requires essentially the resolution of the linear system of 
equations (19). The square matrix B{9) is of order 2M+1. Very often, it is stored 
with a compact scheme and the system is solved using an iterative method like 
Gauss-Seidel. The time complexity depends on the number of steps, denoted 
by p, and the number of iterations per step of average I. Let rj' be the number 
of non-null elements in the time complexity of the IRKS method is then 

O {Ipp') . 

5 Numerical Results 

We consider a fault-tolerant multiprocessor system including n processors and b 
buffer stages. The system is modelled as an M jM jnln + b queuing system. Jobs 
arrive at rate A and are lost when the buffer is full. The job service rate is 
O. Processors (resp. buffer stages) fail independently at rate A (resp. 7 ) and 
are repaired singly with rate p (resp. r). Processor failure causes a graceful 
degradation of the system (the number of processors is decreased by one). The 
system is in a failed state when all processors have failed or any buffer stage has 
failed. No additional processor failures are assumed to occur when the system is 
in a failed state. The model is represented by a CTMC with the state-transition 
diagram shown in figure 1 . At any given time the state of the system is (i,j) 
where 0 < i < n is the number of nonfailed processors, and j is zero if any of 
the buffer stage is failed, otherwise it is one. An appropriate reward rate in a 
given state is the steady-state throughput of the system with a given number of 
nonfailed processors [7]. The reward rate is zero in any system failure state. In 





Fig. 1. State-transition diagram for an n-processors system 
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this experiment, the number of states is M = 2(n + 1). We shall choose A = 7 = 
10~® per hour and /r = r = 100 per hour, in order to produce an extremely stiff 
Markov process. The numerical results are obtained by executing the algorithms 
on a SUN Ultra-30, 295 MHZ station in numerical double precision. For the SU 
method, the tolerance £ is 10“^° for for all the values of t. The local tolerance 
T for The IRKS method is also set to 10“^°. These values give an acceptable 
precision for both of the presented methods [8] . 

First of all, the EAR sensitivity was computed for a moderate state space 
cardinality, M = 10. The mission time t was varied from 1 to 10® hours. We 
concluded that, in this case, the SU method performs very well. When we in- 
creased the number of state to M = 100, the SU method was better only for t 
less than 100 (figure 2). Beyond that limit, the IRKS method was faster than the 
SU technique. To show how far IRKS method resists to great values of M and t, 
we executed it for M = 400 and t still varying from 1 to 10® hours. The results 
are plotted in figure 3. We realized that the computation of the EAR sensitivity 




Fig. 3. CPU time vs mission time for M = 400 
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for M = 400 and t = 10® took about 27 hours CPU time when using IRKS 
method while it was practically infeasible by the SU method. We conclude that 
even if it allows the global error control, the SU method remains usable only for 
moderate values the mission time. When stiffness and mission time increase, the 
IRKS method may be recommended. 
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Abstract. We investigate interesting spectral properties of circulant 
matrices with a band strncture by analyzing the roots of an associated 
polynomial. We also derive practical conditions about the curve contain- 
ing the eigenvalues of the matrix which can be used to study the stability 
domain of some numerical methods for the solution of ODEs. 
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1 Introduction 

Circulant matrices are a quite useful tool of linear algebra to emphasize period- 
ical behaviour of several phenomena. This means that circulant matrices often 
arise in problems of physics, probability and statistics, geometry, and numerical 
analysis. 

The number of known properties of this class of matrices is enormous [5]. 
The main one is that any operation (sum, product, inverse, transpose) involving 
circulant matrices still gives a circulant matrix. For this reason, when working 
with Toeplitz matrices which constitute a larger class of matrices, one often 
resorts to circulant matrices [4,7]. 

Circulant matrices are a subclass of normal matrices which are diagonalized 
by the Fourier matrix. The eigenvalues of such matrices are on a curve which 
can be obtained by means of an explicit formula involving its elements. 

This last property allows us to relate circulant band matrices with some 
numerical methods for the solution of ordinary differential equations. Let us 
consider a fc-step linear multistep method 

k—iy k—u 

! Un+j ) (1) 

j = j = -l' 

with 1 / initial and k — v final conditions. The idea of selecting a number of initial 
conditions different from v = k has been used to define Boundary Value Methods 

* Work supported by MURST. 
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(BVMs) which constitute an important class of methods for the solution of initial 
value problems [3,6]. By considering the functions 



k—u 

p{z) = 

0 = -v 



k—y 

j=-y 



(2) 



the boundary locus (the boundary of the linear stability domain) of (1) is given 
by the curve of the complex plane p(e*®) for 9 G [0, 27t] (i is the imaginary 
unit). 

The functions p(e*®) and (r(e*®) also represent the curves containing the eigen- 
values of the circulant band matrices 



/ «o cni ... ak-i^ 0 . . . 0 a-„ . . . a_i \ 

a_i ao ' ■ : 



A = 



Q^ — iy 

0 

0 

^k—h 



0 



0 a-^ 

0 

■■■ 0 



: ■ . ■ . ■ . ■ . ' ■ ao ai 

y ai ... ak-i/ 0 ... 0 a_y . . . a_i ao j 



( 3 ) 



and B defined analogously. The matrix (3) is banded since we suppose that k 
is much smaller than the dimension of the matrix itself. The above consider- 
ation implies that the boundary locus coincides with the curve containing the 
eigenvalues of B~^A. 



Example 1. The explicit Euler method = yn-i+hf(tn-i, yn-i) has a stability 
domain given by the circle with center (-1,0) and radius 1. The same curve may 
be obtained from the matrices 



A = 


/ 1 
-1 1 




B = 


/o 
1 0 






1 


-1 1 j 




1 


1 0/ 



by considering the spectrum of the family of matrices 
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B~^A = 



(-1 1 

-1 



V 1 



\ 



1 



- 1 / 



As a further example the trapezoidal rule has an unbounded stability domain 
because the matrix B is singular. 

In analogy with what has been done with the linear multistep method (1), 
in this paper we analyze some properties of circulant band matrices (3) by using 
the information given by the roots of their associated polynomials 



p{z) = z’'p{z), a{z) = z’'a{z). (4) 

Our aim is also to derive in a simple way some important properties about the 
boundary locus of linear multistep methods. 



2 Conditioning of Circulant Band Matrices 

From the study of the conditioning of Toeplitz band matrices [1], it has been 
derived that the family of matr 

/ <ao cti 
a_i ao 



ca, — 1/ 

V 

is well conditioned (the condition numbers are uniformly bounded with respect 
to n) if the associated polynomial (4) has z/ roots of modulus smaller than 1 
and the remaining of modulus larger than 1. On the other hand, it is weakly 
well conditioned (the condition numbers grow as a small power of n) if (4) has 
exactly either v roots of modulus smaller than 1 or fc — zz of modulus larger than 
1, i. e. possible roots of unit modulus are all among the first v or the remaining 

k — V. 

The same properties cannot be generalized to a family of nonsingular cir- 
culant band matrices since the condition number of any matrix in this class is 
independent of the size of the matrix. Anyway, by considering the matrix (3) 
which is generated by the same elements of the corresponding Toeplitz matrix 
(5), a number of interesting properties can be derived. Let us start from the 
following basic results, whose proof follows by straightforward calculation: 



ices 



■ • ■ Ctk-v 






( 5 ) 



ao 

a_[/ . . . a_i ao 
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Theorem 1. Let A he the circulant matrix (3) and p{z) its associated poly- 
nomial as defined in (f). If zi, Z 2 , ■ ■ ■ , Zk are the roots of p, then A may he 
decomposed in the form 



u k 



A — ak-i/ Cj Ej, 

j=l j=u+l 

where Cj and Ej are the following elementary matrices 

( 1 -Zj \ 

— Z4 1 

\ -^3 1 / 




(6) 



The eigenvalues of an elementary matrix are on the circle centered at the 
diagonal element of the matrix and radius equals to the modulus of the off- 
diagonal element. Indeed, the eigenvalues of the matrix Cj and Ej are, respec- 
tively. A; = 1 — ZjW for j = 1, . . . , i/, and A; = -Zj-\-w for j = . . .,k, 

where w is the nth root of unity, w = For what concerns the eigenvalues 

of the matrix A, the following result holds: 



Theorem 2. The eigenvalues of the matrix A in (3) are given hy 



k 

Xi = ^ / = 0,...,n-l 

i=i 

where are the eigenvalues of the elementary matrices (6). 

Proof. The thesis follows from the fact that the eigenvector corresponding to the 
(7) 

eigenvalue X f is the /th column of the Fourier matrix. Therefore, any eigenvalue 
of the product of circulant matrices Cj and Ej is given by the product of the 
corresponding eigenvalues. □ 

We are now in a position to easily derive the following 

Corollary 1. The family of circulant matrices (3) is nonsingular if the associ- 
ated polynomial (f) has no roots of unit modulus. The condition number of (3) 
depends on the distance of the roots from the unit circumference. 

We observe that if one root is equal to 1, then the circulant matrix is always 
singular since the corresponding matrix Cj or Ej is singular. 

A complementary result to that of Corollary 1 should be the calculation of 
the minimum eigenvalue which corresponds to the 2-norm of the inverse of A. A 
practical criterion can be derived by analyzing the function 



k—u 

f{S) = ajCos(j6»). 



( 7 ) 
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If / is strictly monotone in ( 0 , 7 r) and, in addition, /(O) and /(tt) have the same 
sign, the following two properties can be deduced: 

— the curve containing the eigenvalues entirely lies in the real positive or in 
the real negative half plane; 

- \Xmzn\ = min(|/(0)|, |/(7 t)|). 

The first property is quite useful to check whether the boundary locus of a 
linear multistep method ( 1 ) is entirely in the real positive half plane and, there- 
fore, the method is Aj,^fe_,y-stable (a property that corresponds to A-stability for 
linear multistep methods ( 1 ) with v = k \3]). 

To obtain practical conditions ensuring the monotonicity of f{6) it is conve- 
nient to use just the coefficients aj rather than the roots of p. By considering the 
variable transformation cos 9 = t, function (7) can be recast as the polynomial, 
of degree k — v, f(t). Since the function f{9) is strictly monotone if and only 
if f'{9) 0 in ( 0 , 7 t), that is f'{t) yf 0 for t € (—1,1), we need to check that all 

the roots of f{t) are greater than 1 in modulus. 

From the usual substitutions cos 0 = 1 and cos(n-|-l)0 = T„+i(t) = — 

Tn-i{t) we obtain the following expression for f{t) associated to the matrix with 
bandwidth max(i/, k — v) < b: 



/(o(t) = 80(as -I- + 32(a4 -|- a-A)t^ + 12(a3 -I- a_3 - 5(0:5 -I- 

+4(02 -I- 0-2 — 4(04 -I- a-i))t -1-01-1- 0-1 — 3(03 -I- 0-3) -I- 5(05 -I- 0-5). 

As an example, almost circulant tridiagonal matrices satisfy f{t) yf 0 when 
(by considering 02 = . . . = 05 = o _2 = . . . = o _5 = 0 in the above expression 
for oi yf 0 - 1 , while the coefficients of almost pentadiagonal circulant 

matrices need to satisfy 



|oi -I- 0 _i| > 4 |o 2 -I- 0 - 2 ! ■ 

For matrices with a bandwidth larger then 2, the above condition should be 
checked numerically. 



3 Stability Domain of ODE Methods 

In this section we analyze the boundary locus of some known linear multistep 
methods used as BVMs by using the properties of their associated circulant band 
matrix given in the previous section. The obtained results are in general not new, 
but are here re-derived in a quite simple way. 

Since each linear multistep method (1) satisfies p(l) = 0, then the corre- 
sponding matrix A as defined in (3) is always singular and the curve containing 
the eigenvalues crosses the origin of the complex plane. 

By recalling that the boundary locus of a linear multistep method (1) is 
equivalent to the curve representing the spectrum of B~^A, where B is associated 
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to the function a{z), for Corollary 1 it is sufficient that one root of (t(z) is equal 
to 1 in modulus in order to obtain an unbounded boundary locus. This is the 
case, for example, of the Extended Trapezoidal Rules of the second kind [3] 

^ ' ^jVn+j = ^(/(^n+l) 2/n+l) + /(^nj2/ra)); 

3=-v 

where the coefficients otj are chosen in order to obtain the maximum attainable 
order. 

On the other hand, the obtained results are not useful to state that these 
methods are perfectly A,^_fc_i.-stable (the boundary locus coincides with the 
imaginary axis). 

The previous methods can be generalized by using any value of v and still 
have an unbounded boundary locus. In general they are used as initial and final 
methods to obtain a BVM (see [3]). 

A different family of methods that is easy to analyze is that of GBDFs 
(Generalized BDFs, see [2]) defined as (/ = 1,2) 



V—l 

^ ' (^jVn+j = hf[tn,yn)- ( 8 ) 

3=-’^ 

This family of methods has the matrix B equal to the identity matrix and 
hence the associated boundary loci are bounded curves. Here it is more conve- 
nient to use the variable change cos 0 = t + \ since the obtained polynomial, 
expressed for i/ = 5 and / = 1 by means of the formula {k = 2v — 1) 



fk{t) = 16 a_ 5 t^ -I- 8(0:4 -I- 0-4 -I- 10o_5)t^ -I- 4(03 -I- 0-3 -I- 8(04 -I- 0-4) 
-|- 35 o_ 5 )t^ -|- 2(02 -|- 0—2 “k 6(03 -k 0—3) -k 20(04 “k 0_4) -k 50 o_ 5 )t^ 

4 4 5 

+ ^ i^ait -I- ^ Oi = ^ ait\ 
i= — 5 2=0 

has in general all the coefficients oq = . . . = a^-i = 0. In fact, formula (8) has 
order 2v — I and hence, among the others, the conditions 

V — l 
3 = -v 

must be satisfied. Gonditions (9) 
homogeneous linear system 



/I 


1 


1 


• 1 ^ 




/ «0 \ 






0 


1 


4 


. Z/2 




0_1 -k Oi 




0 


0 


1 


16 


. Z.4 




0_2 -k 02 


= 


0 


VO 


1 


y2(v-l) _ 






^o_jy “k (y-i/ j 




\o/ 



= 0, s = 0, . . . , z/ — 1. (9) 

are expressed in matrix form by the following 
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where a,y = 0 and, if / = 2, also a^-i = 0. By applying Gaussian elimination to 
the above system one has (the coefficient matrix is:^x(:^ + l)) 





1 .. 


1 


1 \ 




/ ao \ 






1 


4 .. 








cr_i -f CKi 




0 




12 .. 


,. iy{iy-l)‘^{iy-2) 


_ 1) 




a_2 -|- Q!2 


= 


0 


V 




l<y 


/ 




\ Q!_jy -|- ai, J 




\o/ 



where is a constant which depends on the size of the matrix, and, by consid- 
ering a suitable row scaling, one then obtains ly linear combinations of the coeffi- 
cients ai corresponding to the previously stated Ui = 0, for i = 0, . . . , ly — 1 (this 
was proved by direct computation up to = 15). Therefore, fk(t) = 

thatl/.(0) = 2^-Wcos0-l)G 

The value 0 = 0 is the only root of fk and this means that fk is strictly 
monotone in (0,7r). Moreover, since /fc(7r) > 0, all the methods are A„^k-i^~ 
stable. We observe that the higher the multiplicity of 9 as root of fk, the more 
the boundary locus of the GBDF is flattened on the imaginary axis (see Fig. 1). 




Fig. 1. Boundary locus of GBDFs for k = 1, . . . , 6 





Spectral Properties of Circulant Band Matrices Arising in ODE Methods 



17 



References 

1. P. Amodio, L. Brugnano, The conditioning of Toeplitz band matrices, Math. Corn- 
put. Modelling 23 (10) (1996), 29-42. 12 

2. L. Brugnano, D. Trigiante, Convergence and stability of Boundary Value Methods, 
J. Comput. Appl. Math. 66 (1996), 97-109. 15 

3. L. Brugnano, D. Trigiante, Solving ODEs by Linear Multistep Initial and Boundary 
Value Methods, Gordon & Breach, Amsterdam, (1998). 11, 14, 15 

4. T. F. Chan, An optimal circulant preconditioner for Toeplitz systems, SIAM J. Sci. 
Stat. Comput. 9 (1988), 766-771. 10 

5. P. J. Davis, Circulant matrices, John Wiley & Sons, New York, (1979). 10 

6. F. lavernaro, F. Mazzia, Block- Boundary Value Methods for the solution of Ordi- 
nary Differential Equations, Siam J. Sci. Comput. 21 (1999), 323-339. 11 

7. V. V. Strela and E. E. Tyrtyshnikov, Which circulant preconditioner is better?. 
Math. Comput. 65 (213) (1996), 137-150. 10 



A Parameter Robust Method for a Problem 
with a Symmetry Boundary Layer* 

Ali R. Ansari^, Alan F. Hegarty^, and Grigorii I. Shishkin^ 

^ Department of Mathematics & Statistics, University of Limerick 
Limerick, Ireland, 

ali . ansar i@ul . ie , alan . hegartySul . ie 
^ Institute of Mathematics and Mechanics, Russian Academy of Sciences 
Ekaterinburg, Russia 
grigorii@shishkin.ural . ru 



Abstract. We consider the classical problem of a two-dimensional lam- 
inar jet of incompressible fluid flowing into a stationary medium of the 
same fluid [2]. The equations of motion are the same as the boundary 
layer equations for flow over an inhnite flat plate, but with different 
boundary conditions. Numerical experiments show that, using an appro- 
priate piecewise uniform mesh, numerical solutions are obtained which 
are parameter robust with respect to both the number of mesh nodes 
and the number of iterations required for convergence. 



1 Introduction 

Numerical methods for the solution of various linear singular perturbation prob- 
lems, which are uniformly convergent with respect to the perturbation parame- 
ter, were developed in, inter alia, [5,6,7]. The key idea in these methods is the use 
of piecewise uniform meshes, which are appropriately condensed in the boundary 
layer regions. It is of interest to determine whether these ideas can be used for 
nonlinear problems, in particular flow problems. We thus apply the technique to 
simple model problems, the exact solutions of which are available . In [7] it was 
shown that, for the flat plate problem of Blasius [1,2], the method is uniformly 
convergent with respect to the perturbation parameter. 

Here we examine analogously the classical two-dimensional laminar jet prob- 
lem [2] . A two-dimensional jet of fluid emerges from a narrow slit in a wall into 
static medium of the same fluid. If the jet is thin, such that u the horizontal 
component of velocity varies much less rapidly along the jet i.e., the a:-axis, 
than across it, we have a boundary layer at y = 0, i.e., the axis of the jet [3,4]. 
The pressure gradient is zero in the jet since it is zero in the surrounding fluid. 
The equations of motion are therefore the same as the Prandtl boundary layer 
equations [2], i.e., 

— VUyy + UUx + VUy = 0 . , 

'^y — 6 

* This work was supported in part by the Enterprise Ireland grants SC-97-612 and SC- 
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but with the different boundary conditions 

Uy{x, 0) = v{x, 0) = = 0 Va;>0 

( 2 ) 

lim u{x, y) = 0 V a; G H 

y — >-±00 

The primary equation of motion, involving the second derivative of u and 
the viscosity v, is clearly a singularly perturbed differential equation with v as 
the perturbation parameter. Our objective here is to obtain numerical solutions 
to this problem that are robust with respect to v. The sensitivity of classical 
numerical methods to the perturbation parameter is reflected in the maximum 
pointwise errors becoming unacceptably large for small v. This has been shown 
for linear problems, e.g., in [5] where it is also seen that inappropriate condensing 
of the mesh in the boundary layer region also fails to resolve the difficulty. 

The approach adopted here will involve a piecewise uniform mesh [6], which, 
when used in conjunction with an upwind finite difference method, leads to pa- 
rameter robust solutions, i.e., numerical solutions where the maximum pointwise 
error tends to zero independently of the perturbation parameter, while the work 
required to obtain the solutions is also independent of v. As analytical solu- 
tions of this particular problem are achievable we will use them to compute the 
discretisation errors in the L°° norm. 

It should be noted that Prandtl’s boundary layer equations are valid approx- 
imations to the Navier-Stokes equations only for a small range of values of v. As 
there is no known parameter robust method for solving the Navier-Stokes equa- 
tions, even for this simple geometry, it is worthwhile considering the solution of 
the simpler model, even for values of v where it is not physically valid. Numerical 
results will verify that the numerical method is indeed parameter robust. 



u = 6v(p^ —sech^ (f 



(3) 



2 The Analytical Solution 

As mentioned in the previous section it is possible to obtain analytical solutions 
to the jet problem under consideration here [2,3,4] . The solutions for u and v 
are given here without derivation 

X 

y2 

V = 2vip [2(/?sech^(/? — tanh </?] (4) 

where ip = 5(g) \'0^) ^ viscosity, p is the density and Jq is 

defined as 

pOQ 

pu^dy = Jo = constant. 

) 

Furthermore, some simple analysis [1,2] shows that the thickness of the boundary 
layer ^ is 

, 2 \ 1/3 






.2/3 



(5) 



Both p and Jq are constants and we set p = 1 = Jq = 1 here. 
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3 The Numerical Solution 

To begin with we must decide on a domain of solution. We confine consideration 
to a finite rectangle = {a, A) x (0,i3), where the constants a, A and B are 
fixed and independent of the perturbation parameter v. We fix a > 0 as the 
equations are singular at x = 0 (this is apparent from (3) & (4)). The size of 
the near-wall subdomain where the equations are not appropriate increases with 
1 jv and thus allowing a to increase as zz — > 0 would make the problem easier. 
However, we require the method to work well on a fixed domain and thus fix a. 
We denote the boundary of f? by T = /^ U Tr U It U where Tr, /r, Ft and 
Fb denote the left, right, top and bottom edges of SI respectively. 

We are now in a position to define the computational mesh for this problem. 
On the rectangular domain S2 we place the piecewise uniform rectangular mesh 
which is defined as the tensor product S2^ = x SS^'' where N = 

{Nx,Ny). Here is a uniform mesh over the interval [a, A] with mesh 
intervals, while S7^^ is a piecewise uniform fitted mesh with Ny mesh intervals 
on the interval [0,5]. The interval [0,5] is divided into two subintervals [0, cr] 
and [cr, 5], and ^Ny uniform mesh intervals are assigned to each subinterval. 
Note that in this paper we set = Ny = N. 

The transition point cr is of significance as, by reducing cr as zz decreases, the 
mesh in the neighbourhood of the x-axis will be condensed, cr is chosen, following 
the principles set out in [6] and [7] as 



constant 2 is based on experimental work, which seems to suggest this as a near 
optimal value giving reasonable convergence rates for the iterative process. 

Note that though (5) shows that the jet spreads out as x increases, the choice 
of cr ignores this. The reason for this is that the errors dominate near x = a; 
when the jet spreads beyond y = a, the velocity and errors are much reduced. 
This reiterates the simplicity of the solution technique. 

We linearise the first equation by adapting the continuation algorithm set 
out in [7] for the problem of flow past a flat plate. In the case of the jet problem, 
we encounter stability difficulties and thus we need to generalise the algorithm 
from [5], as elaborated below. 

After linearisation and discretisation of (1) and the associated boundary con- 
ditions (2) we have the sequence of discrete linear problems for m = 0, 1, . . .: 




The choice of zz^/^ is motivated from (5), while the particular choice of the 




with boundary conditions 



5°5™(x„yo) = 0, F™(x„yo) = 0, 5™(x„y^)=0 



(7) 
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where 






Ujrixi,yj+i)-Ur{x,,yj) 

Vj+i - Vj 

u::^{xi,yj)-u::^{x,,yj.,) 

Vj - Vo -I 



with analogous definition of U^{xi,yj) and V^{xi,yj), 



and where 



%+i - Vj-i 



„ D+U^(xi,y,) - D-U^(xi,y,) 

x2Tjm{^ ^ y ^ y u \ 

^ [Vj+i - %-i)/2 



r D-Ul^{xuv,) for V,”^{xi,y,) > 0 
\D+U:nx,,y,) forV,^{xi,yj)< 



In addition, 



\x,,yj) = 0iC/™-i(xi,j/,) + (1 - 0i)[/r""(cr„%) 

vr~\xi,y,) = 92Vr-\xi,y,) + (1 - e^W^-^x^Vj) 

where the parameters 0 < 0i , 02 < 1 are selected to stabilise the iterative process 
as i' becomes smaller. For large ly we set 0i = 02 = 1- Experimentally, it has 
been noted that when ly < 2“^^ the number of iterations starts to increase but 
this problem is easily overcome by appropriate choice of 0i , 02 • 



4 Numerical Results 



The analytical solution has a singularity at cc = 0. This means that the choice 
for constants that define the x-range of the domain i.e. x € [a. A] needs to be 
restricted to a > 0 to avoid the singularity. Here we (arbitrarily) set a = 0.1, 4 = 
1.1 and B = 1. The piecewise uniform mesh for this problem, 12^ = {(xi,yj)}, 
with the above constants is 

Xi = Xi-i + h 



where 



_(2icr/N i = 0,l,2,...N/2 

{a + 2{i-N/2){l-a)/N i = N/2,...,N 



a = min i -,21^3 In N } . 
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At this point we summarise the problem as 

' Find {Ui,,Vv) such that y{xi,yj) G 17^ 

-iySlU:rix,,yj) + C7™-ii7-C/-(xi,y,) + t/— y,) = 0 



P. 



,N 



D-Up{x,, y,) + y,) = 0 



77°t/™(x„yo) = 0 and KT(x„yo) = 0 on Tb 
_ C/™ = w on Fl U Ft 

The algorithm for solving sweeps across the domain 17 from Fl to Fa- 
At the stage of the sweep, we compute the values of on Xi = 

{{xi,yj),0 < j < N}, where {Uv,Vu) are known on Xi-i. This is achieved by 
solving the first linearised equation for followed by a solution of the second 
linear equation for Vi,. 

In order to solve the first equation on Xi we need values of Uv on Xi_i, 
boundary values for 17™ on / bU/t and an initial guess at f/° on Xi. On each Xi, 
the 2 point boundary value problem for Up{xi, yj) is solved for 0 < j < iV — 1. 
Since f/™(xi,yo) is thus an unknown the term DyU^{xi,yj) can and does in- 
troduce the value C/™(xi,y_i), which is eliminated by implementing the cen- 
tral difference approximation of the Neumann condition, so that all instances 
of Up{xi,y-i) are replaced by C/™(xi,yi). The initial guess to start the algo- 
rithm i.e. C/° on Xi is taken from the prescribed boundary condition for Ui, (the 
analytical solution) on F^. For each Xi, is set to be zero. 

Once the solution to the tridiagonal system of equations for is obtained 
we then solve the linear system 

D~U^{xi,yj) + D~V^{xi,yj) = 0, l<j<N, 

for Vi, . The process here is trivial as Ui, is known from the previous step and V^, 
is initialised using the boundary condition i.e. VI/ = 0 on Tb- 

This process is continued until a stopping criterion is achieved. This involves 
setting the tolerance tol, for the difference between two successive iterates i.e. 

max(|f/™ - u:r-%^,\vr - m 

where we take tol to be 10“®. We let m = M for all instances where the stopping 
criterion is met. Once this happens we set 11^, = and W = on Xi proceed 
to the next step using Uu(xi,yj) as the initial guess for Up{xi+i,yj). 

Graphs of the solution {Uu,V,y) using this direct method with N = 32 for 
1 / = 1 and ly = are shown in Figs. 1 and 2. Graphs of the errors in the 
numerical solutions for v = 2“^° are shown in Fig. 3. Additionally, we approx- 
imate the scaled partial derivatives and ^ by the corresponding 

scaled discrete derivatives D~U,^,D~Ui, and D~V„. Note that |y = — and 
correspondingly D~Ui, = —D~V,^. 



A Parameter Robust Method 



23 






Fig. 2. Surface plots of numerical solutions on \ v = 2 N = 32 




Table 1 lists the maximum errors and corresponding ^^-uniform convergence 
rates for the velocity components {u, v) and their scaled derivatives on 17^. It is 
evident that all the results are robust apart from the scaled approximation to Vx , 
which is robust only for a subdomain of 17^ which excludes a neighbourhood of 
X = a, for example 17^ = 17^ n (0.2, 1.1] x [0, 1] as in the last 2 rows of Table 1. 
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Fig. 5. Surface plot of approximations to the derivatives on \ v = 2 N = 32 



Table 2 shows that with a simple choice of 0i = O 2 = 0.75 for all N and v 
the number of iterations per ‘time-like’ step Xi {i.e., total number of iterations 
divided by N) increases only very slightly with l/v. 

5 Summary 

We have demonstrated through experimental results that the numerical method 
and associated algorithm gives solutions for the velocity terms and their scaled 
discrete derivatives which appear to be uniformly convergent with respect to the 
viscosity v. The number of iterations of the algorithm depends weakly on v but 
it is believed that this can also be rectified. However, the method is not claimed 
to be optimal, and future work will involve the investigation of alternative meth- 
ods of solution of the nonlinear system of equations. Other matters for further 
investigation include the dependence of the numerical solutions on the distance 
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Table 1. Maximum pointwise errors and associated z/-uniform rates 






\\U^-u\ 



32 64 128 256 512 

0.53(-01) 0.31(-01) 0.18(-01) O.lO(-Ol) 0.57(-02) 

0.78 0.80 0.81 0.83 



II K II 

^ ^ II ^3:||^N 

v\\Dy Uv — Ui/llf^N 

v\\Dx Id/ Vx II 



0.23(+00) 0.14(+00) 0.11(+00) 0.81(-01) 0.53(-01) 

0.73 0.33 0.47 0.62 

0.45(+00) 0.44(+00) 0.32(+00) 0.21(+00) 0.13(+00) 

0.02 0.46 0.62 0.74 

0.37(+00) 0.33(+00) 0.22(+00) 0.14(+00) 0.77(-01) 

0.15 0.57 0.73 0.81 

0.17(+02) 0.27(+02) 0.34(+02) 0.38(+02) 0.40(+02) 

-0.67 -0.30 -0.18 -0.08 



-VyWn^ 0.45(+00) 0.44(+00) 0.32(+00) 0.21(+00) 0.13(+00) 

0.02 0.46 0.62 0.74 

v\\DxV^ -VxWnN 0.12(+01) 0.82(+00) 0.53(+00) 0.30(+00) 0.17(+00) 

0.54 0.63 0.82 0.84 



Table 2. Number of one-dimensional linear solves to attain a solution (scaled 
by factor 1/iV) with 9i = 62 = 0.75 



V 


32 


64 


128 


256 


512 


1 


6 


6 


6 


5 


5 


2-4 


13 


13 


13 


12 


12 


2-8 


15 


15 


15 


16 


17 


2-12 


14 


14 


14 


14 


16 


2-16 


14 


14 


14 


13 


14 


2-20 


15 


15 


15 


14 


14 


2-24 


17 


17 


16 


16 


16 


to 

1 

to 

00 


19 


18 


18 


17 


18 



from the wall a and a comparison of the value of Jq at J/j with the imposed 
value at 
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Abstract. We develop a new algorithm for solving Toeplitz linear least 
squares problems. The Toeplitz matrix is hrst embedded into a circulant 
matrix. The linear least squares problem is then transformed into a dis- 
crete least squares approximation problem for polynomial vectors. Our 
implementation shows that the normwise backward stability is indepen- 
dent of the condition number of the Toeplitz matrix. 



1 Toeplitz Linear Least Squares Problems 

Let m > n > 1, t-n+i, ■ ■ ■ , tm-i € C and 

rri ^ r I — 1 

a m X n Toeplitz matrix that has full column-rank. Let b G C™. We want to 
solve the corresponding Toeplitz linear least squares problem (LS-problem) , i.e., 
we want to determine the (unique) vector x G C” such that 

\\Tx — b\\ is minimal (1) 

where || • || denotes the Euclidean norm. 

Standard algorithms for least squares problems require O(mn^) floating point 
operations (flops) for solving (1). The arithmetic complexity can be reduced by 
taking into account the Toeplitz structure of T. Several algorithms that require 
only 0{mn) flops have been developed. Such algorithms are called fast. One 
of the first fast algorithms was introduced by Sweet in his PhD thesis [10]. 
This method is not numerically stable, though. Other approaches include those 
by Bojanczyk, Brent and de Hoog [1], Chun, Kailath and Lev-Ari [3], Qiao [9], 

* The work of the first and the third author is supported by the Belgian Programme on 
Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister’s 
Office for Science, Technology and Culture. The scientihc responsibility rests with 
the authors. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 27—34, 2001. 
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Cybenko [4,5], Sweet [11] and many more. None of these algorithms has yet been 
shown to be numerically stable and for several approaches there exist examples 
indicating that the method is actually unstable. 

Recently, Ming Gu [7] has developed fast algorithms for solving Toeplitz and 
Toeplitz-plus-Hankel linear least squares problems. In his approach, the matrix is 
first transformed into a Cauchy-like matrix by using the Fast Fourier Transform 
or trigonometric transformations. Then the corresponding Cauchy-like linear 
least squares problem is solved. Numerical experiments show that this approach 
is not only efficient but also numerically stable, even if the coefficient matrix is 
very ill-conditioned. 

In this paper we will also develop a numerically stable method that works 
for ill-conditioned problems — in other words, for problems that cannot be solved 
via the normal equations approach. We proceed as follows. The original LS- 
problem is first embedded into a larger LS-problem. The coefficient matrix of 
the latter problem has additional structure: it is a circulant block matrix. This 
LS-problem is then (unitarily) transformed into a LS-problem whose coefficient 
matrix is a coupled Vandermonde matrix. The latter LS-problem is then solved 
by using the framework of orthogonal polynomial vectors. 



2 Embedding of the Original LS-Problem 



We embed the original LS-problem (1) in the following way. Let A and B be 
matrices and let a and y be vectors. The extended LS-problem is formulated as 
follows: determine the vectors x and y such that the norm of the vector 





'A B' 




X 




a 


r := 


T 0 




y. 




b 



is minimal. (We assume, of course, that A, B, a and y have appropriate sizes.) 

X 

If the matrix B is nonsingular, then the first ‘component’ x of the solution ^ 

of the extended LS-problem coincides with the solution x of the original LS- 
problem for any choice of A, B and a. We can always choose A and B such that 
the two block columns 



Cl := 



A 

T 



and 



C2 := 



B 

0 



are circulant matrices. For example, we can choose B equal to the identity matrix 
of order n — 1 and we can choose A as the (n — 1) x n Toeplitz matrix 

A ■— 1* -™+l+i-fcJj=0.....n-2 

with t-n-k = im-k-i for fc = 0, 1, . . . , n — 1. We take a to be the zero vector. 
However, we can also choose the size of B larger to obtain a number of rows M 
for the two circulant matrices C\ and C 2 such that the discrete Fourier transform 
of size M can be computed efficiently. For example, we could choose M as the 
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smallest power of two larger than or equal to m + n — 1 . The matrices A and B 
are now chosen to have sizes {M — m) x n and {M — m) x {M — m), respectively. 
Note that B is square and assumed to be nonsingular. 



3 Transformation of the Extended LS-Problem 



Define C 3 as the vector 



C. := - 






The vector C 3 can be interpreted as the first column of a circulant matrix. 
The extended LS-problem can therefore be formulated as follows: determine the 
vectors x and y such that the norm of the vector 



r= [Cl C2 C3 



X 

y 

1 



e 



is minimal. Note that the matrix [Ci C 2 C 3 ] is of size M x (n + M — m + 1). 
It is well-known that a, p x p circulant matrix C can be factorized as 

C = 

where t 1 is apxp diagonal matrix containing the eigenvalues of C and Bp denotes 
the p X p Discrete Fourier Transform matrix (DFT-matrix) 



where LOp := e ^nd i = 1. Similarly, if C is of size p x q, where p > q, 

then C can be factorized as 

C = B^ABp^q 

where A is again a p x p diagonal matrix and where Bp^q denotes the p x q 
submatrix of Bp that contains the first q columns of Bp. 

By applying the Discrete Fourier Transform to r, the norm of r remains 
unchanged: ||r|| = ||lFMr||. The following holds: 



Bmt 



Bm Cl C 2 C 3 



X 

y 

1 



= AiBm,71 A2Bm,s A^Bm,! 



X 

y 

1 



( 2 ) 

( 3 ) 



where s := M — m and where Aj =: diag is a M x M diagonal matrix 

for j = 1,2,3. 
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We will now translate the extended LS-problem into polynomial language. 
Define x{z) and y{z) as 



n— 1 s— 1 

x{z) := XkZ^ and y{z) := Y Vkz'". 
k=0 fc=0 

Here Xk and yk denote the components of the vectors x and y. The DFT- 
matrix Tm can be interpreted as a Vandermonde matrix based on the nodes Zk = 
k = — 1. Equation (3) now implies that the extended LS- 

problem can be formulated in the following way: determine the polynomials 
x{z) and y{z), where dega;( 2 ;) < n — 1 and degy{z) < s — 1, such that 

M-l 

\^i,kx{zk) + X2^ky{zk) + \3,kM'^ (4) 

fc =0 



is minimal. 



4 Orthogonal Polynomial Vectors 



The minimisation problem (4) can be solved within the framework of orthogonal 
polynomial vectors developed by Van Barel and Bultheel [2,12,13,14]- The fol- 
lowing notation will be used: to indicate that the degree of the first component 
of a polynomial vector P G C[z]^^^ is less than or equal to a, that the degree 
of the second component of P is less than 0 (hence, this second component is 
equal to the zero polynomial), and that the degree of the third component is 
equal to /3, we write 

a 



deg P = 



-1 



/3 



We consider the following inner product and norm. 



Definition 1 (inner product, norm). Consider the subspace V C of 

polynomial vectors P of degree 



deg P = 



n 

s 



0 



Given the points Zk G C and the weight vectors 

Fk= [Ai,fc A2,fc A3,fe] fc=l,2,...,M, 

we define the discrete inner product {P, Q) for two polynomial vectors P,QgP 
as follows: 

M 

{P,Q) -.= YP^'i^k) Fi^FkQ(zk). 



(5) 
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The norm ||P|| of a polynomial vector P £ V is defined as: 

\\P\\ := VW^- 

A necessary and sufficient condition for (5) to be an inner product in 7^, is that V 
is a subspace of polynomial vectors such that a nonzero polynomial vector P G V 
for which (P, P) = 0 (or equivalently: FkP{zk) = 0, fc = 1, 2, . . . , M) does not 
exist. Our original LS-problem can be now stated as the following discrete least 
squares approximation problem: determine the polynomial vector P* g V' such 
that ||P*|| = minpg-p/ ||P|| where V denotes all vectors belonging to V and 
having their third component equal to the constant polynomial 1. 

In [14], Van Barel and Bultheel formulated a fast algorithm for computing 
an orthonormal basis for V. The degree sequence of the basis vectors Bj, j = 
1, 2, . . . , 5, is as follows: 

0 1 • ■ • n — s n — s n — s + 1 n — s + 1- -- n n n 

_1 _1 ... _1 0 0 1 ••• s-1 s s . 

-1-1... _i _i _i _i ... _i _i 0 

Every polynomial vector P € V' can be written (in a unique way) as: 

s 

P ~ 

i=i 

where oi, . . . , € C. The coordinate as is determined by the fact that the third 

component polynomial of P has to be monic and of degree 0. The following 
holds: 



||Pf =(P,P) 

s s 

~ ® j I ® j ^ 

i=i i=i 

s 

= (since (P„ Bj) = Sij). 

i=i 

It follows that ||P|| is minimized by setting oi, . . . , as-i equal to zero. In other 
words, 

P* = asBs and ||P*|| = jaaj. 

The discrete least squares approximation problem can therefore be solved by 
computing the orthonormal polynomial vector Bs- We obtain P* by scaling Bs 
to make its third component monic. 

5 Numerical Experiments 

We have implemented our approach in Matlab (MATLAB Version 5.3.0.10183 
(Rll) on LNX86). The numerical experiments that we will present in this section 
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are similar to those done by Ming Gu in [7]. The computations have been done 
in double precision arithmetic with unit roundoff u k. 1.11 x 10“^®. We have 
considered two approaches: 

— QR: the QR method as implemented in Matlab. This is a classical approach 
for solving general dense linear least squares problems; 

— NEW: the approach that we have described in the previous sections. 

We have compared the two approaches QR and NEW for two types of Toeplitz 
matrices: 

— Type 1: the entries tk are taken uniformly random in the interval (0, 1); 

— Type 2: to := 2u> and tk '■= for fc yf 0 where u> := 0.25. This matrix 

is called the Prolate matrix and is very ill-conditioned [6,15]. 

The right-hand side vector b has been chosen in two ways: 

— Its entries are generated uniformly random in (0, 1). This generally leads to 
large residuals. 

— The entries of b are computed such that b = Tx where the entries of x are 
taken uniformly random in (0, 1). In this case, we obtain small residuals. 

To measure the normwise backward error, we have used the following result of 
Walden, Karlson and Sun [16]. See also [8, section 19.7]. 

Theorem 1. Let A G b G R™, 0 ^ x G R", and r := b — Ax. Let 6 gM.. 

The normwise backward error 



= TcAn.[\\[AA,9Ah]\\F : [[(A -|- ZlT)a: - (6 -t- Z \&)]|2 = min } 



is given by 

t]f{x) = min{ rji, 

^min ([^ ViC])} 



where 



^ ._ M2 ^ n — T 

■ II II VM’ C . I rp 

llxh r-'r 



and 



h- = 



i + eW2- 



We have computed tif{x) with 6 := 1. 

The numerical results are shown in Tables 1 and 2 for the two possible choices 
of the right-hand side vector b. 



6 Conclusions 

The numerical experiments show that the current implementation is still not 
accurate enough to be comparable with QR or with the algorithms developed 
by Ming Gu. However, the results show that the normwise backward error does 
not depend on the condition number of the Toeplitz matrix. We are currently 
working on improving the accuracy as well as the speed of the implementation 
to obtain a viable alternative for the algorithms of Ming Gu where the Toeplitz 
matrix can range from well-conditioned to very ill-conditioned. 
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Table 1. Normwise backward error (small residuals) 



Matrix 


Order 


k{T) 


rjF{x)/u 


type 


m 


n 






QR 


NEW 




160 


150 


5.4 X 


~W 


1.9 


X 


10^ 


1.7 


X 


10^ 


1 


320 


300 


3.4 X 


10^ 


7.5 


X 


10^ 


9.1 


X 


lO'* 




640 


600 


7.7 X 


10^ 


5.9 


X 


10^ 


3.3 


X 


10® 




160 


150 


2.1 X 


TcP' 


3.9 


X 


10^ 


2.7 


X 


~W 


2 


320 


300 


1.5 X 


10^® 


2.5 


X 


10° 


5.5 


X 


10^ 




640 


600 


1.3 X 


10^® 


2.8 


X 


10° 


1.5 


X 


10° 



Table 2. Normwise backward error (large residuals) 



Matrix 


Order 


k{T) 


r)F{x)/u 


type 


m 


n 






QR 


NEW 




160 


150 


5.4 X 


■rF 


4.1 


X 


10° 


CO 

o 


X 


10° 


1 


320 


300 


3.4 X 


10^ 


1.3 


X 


10^ 


2.5 


X 


lO'* 




640 


600 


7.7 X 


10^ 


1.1 


X 


10^ 


1.4 


X 


10® 




160 


150 


2.1 X 


Tip' 


1.3 


X 


■rF 


3.9 


X 


rF 


2 


320 


300 


1.5 X 


10°° 


1.5 


X 


10° 


8.2 


X 


10° 




640 


600 


1.3 X 


10°° 


2.7 


X 


10° 


2.3 


X 


10° 
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Abstract. Classical accuracy estimation in problem solving is basically 
based upon sensitivity analysis and conditionning computation. Such 
an approach is frequently much more difficult than solving the problem 
itself. Here a generic alternative through the concept of random arith- 
metic is presented. These two alternatives are developped around the well 
know Sylvester equations. Matlab implentation as a new object class is 
discussed and numerically illustrated. 



1 Introduction 

The Sylvester matrix equations (SME) are among some fundamental problems 
in the theory of linear systems. That is why, the question of their reliable so- 
lution, including evaluation of their precision, is of great practical interest. The 
conditioning of SME is well studied and different types of condition numbers are 
derived [1]. Unfortunately, perturbation bounds, based on condition numbers, 
may eventually produce pessimistic results, although better bounds based upon 
local non linear analysis are now available [2]. In any case, only global results are 
given but not component wise analysis. Lastly, this approach is usually much 
more difficult from a numerical computation point of view than the problem 
itself. Basically, their memory cost is O(n^), and their flops count is 0(n®), 
where n is the problem size (assuming for simplicity square matrix unknown). 

Random arithmetic is considered, here, as an alternative approach to com- 
pute simultaneously the solution of a given problem, and its accuracy. This 
technique is fundamentally component wise. Furthermore, its cost is basically 
unchanged compared with the use of standard floating point, except that the 
new unit is not a flop but a Random Flop which is designed here by ’’Rflop”. In 
our Matlab implementation, one Rflop is a small multiple of one flop, and some 
overhead computations. So, this generic technique is, a priori, very competitive 
compared with more classical accuracy scheme. 

The following notations are used later on: - the space of real m x n 

matrices; - the unit n x n matrix; = [aji] - the transpose of the matrix 
A = [aij\] vec(A) € 7 ^™" - the column- wise vector representation of the matrix 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 35—41, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



36 



Alain Barraud et al. 



A G A ig> B = [aijB] - matrices A and B Kronecker product; || • H 2 - the 

spectral (or 2-) norm in 7?,"*^"; ||.||f - the Frobenius (or F-) norm in 



2 Problem Statement and Notations 



Consider the standard Sylvester equation : 



AX + XB + C = Q (1) 

where A G B G and X,C G We suppose that 0 ^ 

{ Ai(A) + Xk{B) : i G l,n, k Gl, m} where Xi{AI) are the eigenvalues of the ma- 
trix M . Under this assumption, the equation (1) has a unique solution. Let the 
matrices A, B and C be perturbed as A A -|- A A B B G- AB C ^ C + AC 
and let the perturbed Sylvester be defined by : 

{A + AA)Y + Y{B + AB) + {C + AC) = Q (2) 



The perturbed equation (2) has an unique solution Y = X + AX, in the 
neighborhood of X if the perturbations (AA, AB, AC) are sufficiently small. 
Denote by : 



Zi := [Aa,Ab,A 



B 



G TZl 



(3) 



the vector of absolute norm perturbations Aa ■= ||Z\A||f, Zii? := ||Z\i?||F and 
Ac '■= IIZiCIlF in the data matrices A B, C; and a = ||A||f,& = ||A||f,c = 
||C'||F,a;= ||-^||f the Frobenius norms of the data and solution matrices . Lastly, it 
is usefull to define the relative perturbation vector A := [AA/a, As/b, Ab/c\ = 

Aa,Ab,Ab GTZ^. 



3 Sensitivity Analysis 

Here, we consider local bounds for the perturbation Ax ■= ||Z\X||f in the solu- 
tion of (1). These are bounds of the type 

Ax<f{A) + O{\\Ar),A^0 (4) 

Ax/x<fiA)/c+O{\\Af),A^0 (5) 

where / is a continuous function, non-decreasing in each of its arguments and 
satisfying /(O) = 0. Particular cases of (4) and (5) are the well known linear per- 
turbation bounds [1]. Denote by Mx, Ma, Mb and Me the following operators 
Mx = A + B'^ = MB = Im®X, Me = /„„^. Then 

absolute condition numbers are given by : 
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and the corresponding linear perturbation estimations are : 

Ax < Ka^a + Kb^b + Kc^c and Ax<Ks\\A \\2 
In the same way relative condition numbers and estimation will be : 



Ka = 



M-^Ma 






/x, Kb = 

I 

k'^sKs = II M 



M~^Mb 






/x, Kc = 



M~^Mc 



/x. 



-1 



Ma^ Mb^Mc 



h/x 



where Ma = clMa, Mb = bMB, Me = cMc, and lastly : 

Ax/x < KaAa + KbAb + KcAc and Ax < Ks 



A 



(7) 



(8) 



(9) 



4 A Random Arithmetic Approach 



Each floating point operation produces a round off error, hence there are po- 
tentially two results, one by lack, the other by excess. They both legitimately 
represent the exact result. Consequently, if a given algorithm contains k arith- 
metic operations there are 2^ results r^, which are all equally representing the 
theoretical result r. Let us define f the mean of the r^. Then, the basic idea 
is that the accuracy of the numerical result given by the considered algorithm 
can be deduced from the dispersion of the r^, i.e. from its standard deviation cr. 
From a practical point of view, some questions must be considered. Firstly how 
to obtain the so called r^, secondly how many Vi must be computed, and lastly 
how to compute a confidence interval [4] . It is currently admitted that rounding 
errors are uniformly distributed on [—1/2 , -1-1/2] ulp, for rounded floating point 
arithmetic as IEEE standard ([0, -1-1] ulp for chopped arithmetic), where ulp 
means Unit in the Last Place. Now, for the simplicity sake, it is supposed that 
rounded arithmetic is used. Consider the elementary floating point opera- 
tion: 2 ; = fl{xoy). Then a particular can be obtained by perturbing this result 
as : 

z = rfl{xoy) = z + e (10) 

where rfl (random floating operation) is an alternative notation for fl. The 
random perturbation e consists in adding 1, or substracting 1 to the last bit 
of z with a probability 1/4, and leaving z unchanged with a probabilty 1/2. 
Practically it is sufficient to generate 3 to 5 realisations of z. Let us define N 
this number. Consequently each standard floating point variable of an algorithm 
is substituted by a set of N values and computed as follows : 



Zi = rfl{xi o Pi), i = 1, ..., N 
Now, let us introduce the following notations : 



( 11 ) 



^ = E{z) = 



■ N 

.2 = 1 . 



/N and = 



N 






/(IV -1) 
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Then the estimated number of ’’significant” bits is : 



nb{z) = min 



^max(log2 



err. 



,Vn 




( 12 ) 



where Tp is the value of the Student’s law for a p% confidence interval of. Clearly 
the number of decimal digits is obtained with logiQ. The numerical result z of 
an algorithm can be defined as follows, to the first order in /?“*, where Zth is the 
theoretical result, (3 the arithmetic base, t the number of base (3 digits, Ui{d) are 
constants depending only on the data and the considered algorithm, ai are the 
values lost at the rounding step (standard floating point arithmetic effect), Cj 
the applied perturbations (random floating point effect). The fundamental point 
is that the following result must be valid : 



z = Zth + ^Ui{d)!3 ‘ [tti - Ci] + 0(/3 ^*) (13) 

i=l 



The theoretical justifications can be found for example, in [3] . Consequently 
E{zi) = Zth- In practice the following hypothesis must be verified : the exponent 
and the sign of each floating point result do not depend on the random pertur- 
bation, the number of operations rfl must be much larger than the number of 
data on which the algorithm operates, the mantissa of the data must be (suf- 
ficiently) randomly distributed. These hypothesis are usually true for real life 
industrial problems. On the contrary, computing the mean of n equal terms does 
not agree with some of these conditions. However, the validity of the first order 
approximation (13) may decrease when the computations accuracy decreases, so 
it can be observed that E{zi) yf Zth- This situation can be dynamically checked 
with nb{z). As a consequence, the algorithm must be stopped, for example, when 
a divide by a non significant value is attempted (not necessarily 0), or several 
operands or data are non significant. At each computation step, the number of 
’’significant” bits is now available. What happens when some of the operands 
of a rfl operation (10) have no significant bits ? The concept of ’’numerical 
zero” (0) offers an easy to implement response, according to the definition : 



'z = 0‘^z = 0or nb{z) = 0 (14) 

This fundamental notion induces some other basic properties which are the 
foundations of the random floating point arithmetic. Some of them are the logical 
tests : yf =, <, <, specified by : 

{ a yf 6 a yf & and nb{b — a)>0; d<b^a<bor nb{b — a) = 0 
d=b^b— a = 0 or nb{b — a) = 0; d < b ^ a < b and nb{b — a) > 0 

Further, nb{a) = nb{b) = 0 must be considered as a fatal error. Another con- 
sequence is that, for example, a test like ”z/ det{A) = 0, break” is now well 
defined in random floating point arithmetic. Computing the determinant of the 
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Hilbert matrices and the Hilbert inverses gives det{A) = 0 ’’true” for dimensions 
greater than 13 in double precision IEEE arithmetic, although their values are 
very small and respectively very large, but with no significant bits. It is more 
important that the rfpa objects does not agree whith the mathematical rules. 
This explains why floating point cannot be view as the numerical counterpart of 
the set R. It has to be noticed for random arithmetic that : 



These fundamental properties explain why this approach is always successful 
until the ’’practical” hypothesis are fulfilled. Furthermore, it can be verified that 



This means that nb{z),t\ie number of the estimated significant bits (12), has a 
probability 0.39 to be pessimistic by more than one decimal place (underestima- 
tion), and a probability ~ 1 to be never optimistic by more than one decimal 
place (overestimation). 

5 A Matlab Implementation 

In order to numerically exhibit how random floating point arithmetic works, 
a Matlab (Mathworks product) implementation has been developped as a new 
object class called ”rfpa” for Random Floating Point Arithmetic. All the basic 
operators working on the default class ’’double” have been overloaded in order 
to be able to execute standard m files. It has been chosen to apply the definition 
(11) to more complex operators thanthe elementary operations. This idea has 
been applied to built in functions such as trigonometries, basic linear algebra 
operators (det, eig, schur, \,...). So, our rfpa implementation mixed true random 
arithmetic and more global ones. There is practically no differences until the 
considered algorithms are (approximatively) backward stable. In the last case, 
perturbations are applied to the data before each of the N executions are run. 
Default random parameter values are N = S, and p = 95%. However these values 
can be changed dynamically. 

6 Solving Sylvester Equations 

Here are reported some numerical examples to illustrate the previously discussed 
two appoaches. Our Sylvester test equation is defined by the Matlab expres- 
sion A = invhilb{n); Z = zeros{n,n); J = ones{n,n); A = [A, Z; J, A];nn = 
length{A)\ B = invhilb{m); Z = zeros{m, m); J = ones{m, m);B = [B, Z; J, B]; 
mm = length{B)-, X = ones{nn,mm); C = —{A * X + X * B). The size of the 



” > ” is the negation of ” < ” and ” < ” is transitive 
” < ” is not transitive and ” = ” is not transitive 
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final A and B matrices are nn = 2n and respectively mm = 2m, where m, n are 
parameters controlling the global difficulty to numerically solve these equations, 
because their condition number increases very quickly with m and n. It must be 
noticed that A, B and C are exact floating point numbers, so no perturbation is 
introduced to solve this problem. A first Matlab output (on a PC with Matlab 
5.3) is obtained with n = 2 and m = 3 : no optimistic estimation of significant 
digit; maximum pessimistic estimation 0.8 decimal place, structured condition 
number : 7.92e+002 ; mean number of significant digits : 12.8 from Ks (6), 13.7 
from random arithmetic, and 14.1 truly. These three estimations are respectively 
called nK, nRf, and nTr. The first one is defined by nK = —logio{sKs), where 
e is the machine precision. Now, the following table is a synthesis of some 8 other 
runs of increasing ill conditionned problems. 



n m Ks nK nRf nTr 


n m Ks nK nRf nTr 


3 2 7.9152e-t002 12.8 13.2 14.5 

4 1 1.5247e-t004 11.5 12.6 13.1 

5 1 5.3501e-b005 9.9 11.6 12.5 
3 4 5.1993e-b004 10.9 12.1 12.8 


4 4 1.0746e-t005 10.6 12 12.5 

5 4 2.8724e-t006 9.2 11.7 12.2 

5 6 1.4135e-t008 7.5 9.6 10 

6 8 2.3443e-t011 4.3 6 6.4 



Clearly the definition of nK implies (implicitly) that the Bartels - Stewart 
algorithm is backward stable, which is not allways true. It is well known that 
there are pathological cases where eKs must be replaced by something like 
NeKs with N>>1. In any case, our comparison argument remains true a fortiori. 



7 Conclusion 

A new generic approach has been presented to estimate accuracy in computed 
problem solution. This technique offers a componant wise analysis and is basi- 
cally the less pessimistic estimate and ’’never” optimistic more than one decimal 
place. For comparison purpose, only the global result (mean number of signif- 
icant digits) has been reported here, although individual number of significant 
digits is obtained for each solution component Xij . Evaluating precision via con- 
dition number computation has usually a complexity greater than the problem 
solving itself. Consequently, random arithmetic is basically cheaper and much 
less difficult than an approach via any sensitivity technique. Artificial perturba- 
tion, an old concept [4] must be considered as an alternative in most of control 
theory problems. 
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Abstract. In this paper we study the combination of averaging theo- 
ries and the numerical integration of the averaged equations by means 
of Chebyshev series methods, that permits to obtain the numerical solu- 
tion as a short Chebyshev series. The proposed scheme is applied to the 
artificial satellite problem. 



1 Introduction 

In the study of long term evolution of celestial bodies in Celestial Mechanics (like 
in very long time integration of the Solar System [10]) different averaging tech- 
niques are usually employed. Most of them are special algebraic and analytical 
techniques developed to facilitate the computation of averaged systems. 

In this paper we present the construction of seminumerical schemes in the 
numerical integration of systems of differential equations by mixing averaging 
theories and a series method for the numerical integration of the averaged sys- 
tem. The approach that we follow employs the modified perturbation method 
proposed in [5] , that uses the Lie series formalism in a way that permits to split 
the differential system in two parts: one that follows a Hamiltonian structure 
and another one that is non-Hamiltonian. 

Afterwards, in the numerical integration of the averaged equations, we con- 
sider a family of symmetric integrators. In particular, we use Runge-Kutta collo- 
cation methods based on Chebyshev polynomials, that give a dense output in the 
form of a Chebyshev series, situation required if we are interested in obtaining 
an “analytical” expression of the solution. 

In the last section, the method is applied to the important problem of the 
orbital analysis of Earth’s artificial satellites subject to Hamiltonian (Earth po- 
tential) and non-Hamiltonian perturbations (the air-drag). 

2 Application of Lie Transforms in Averaging Systems of 
Ordinary Differential Equations 

The typical problem in averaging theory consist of solving the differential system 

x{t) = f{t,x,e) = e + . . ,+e'^ f^°\t,x)+e'"+^ f{t,x,e), x{0,e) = xq. 
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with f periodic in t. Let 

y{t) = f*{y,s) = £ {y) + (y) 

be its truncated averaged system calculated by any perturbation method. For 
this system, there is given a general theorem about the validity of the aver- 
aging method [11,12] that establishes that, under several conditions (among 
them, f(t,x,e) smooth and periodic in t), there exist constants c, eo,T such 
that ||ic(t, e) — y(t, e)|| < ce:^ for 0 < e < ero, and 0 <t <T/e. 

The final attempt of any averaging method is to find the near-identity trans- 
formation that gives us the averaged system, but most of the proposed methods 
(see [11,12]) only give explicitly the direct transformation, and not the direct and 
the inverse one. So, these theories do not give good initial averaged conditions. 

A perturbation method that gives both, the direct and inverse transforma- 
tions and the averaged system is the Lie-Deprit method [7] for Hamiltonian 
systems and its adaptations to general differential systems [9]. Here we use a 
modification given in [5] where the Hamiltonian and vectorial treatment of Lie 
transforms theory are combined for a differential equation system, for which part 
of the perturbing terms have Hamiltonian nature. 



Theorem 1. [5] Given a differential system 



dx 

dt 






i>0 



( 1 ) 



sueh that x = (q,p) represents a set of canonical variables of coordinates q 
and momenta p. Moreover the functions i > 0 are decomposed in two 

different parts ff'\x) = //j-°^(a;) -h /jvjf®(ic), such that come from a 

Hamiltonian then Eq. (1) is transformed, through the generating function 

= E I (2) 

i>l 



into another differential equation 

^ = /*(y,e) = X! “[ {fno\y) + fNHo\y)) , 

i>0 



( 3 ) 



such that y — {q' ,p') is also a set of canonical variables with coordinates q and 
momenta p. Now, the terms fH^\ * > 1 are obtained by calculating = J ■ 

gradjjTfj*^ with J the symplectic matrix J = (^ I o) ’ ^ identity 

matrix, and Tij by using the algorithm of Lie transforms for Hamiltonians 

nf = jfYP + Y ii) ( 4 ) 



0<k<j 
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for j > 0, i > 1; {■; ■} the Poisson bracket and V{q,p,e) = X)i>i ® 

scalar generating function). Finally, the terms fNHQ \ * > 1 (ire calculated with 



£ (0 £ 

JNHj — JNH jj^\ 



E 

0<k<j 



(( 



^k+l 



nNH\ j: 

^k+l) JNH^_ 



(i-i) 



j-k 



/'NH £ (^~1) 

i-k+lJHj_k 



also for j > 0, i > 1. The Lie operators and are defined by 
ds dW° 

C°s= — - W° - ■ s, with o = H orNH. (5) 

OX ox 

Besides, now ■ grad^jV is the generating function built from V. 



It is important to remark that, in general, the generating function is unknown 
and it has to be obtained order by order by solving a linear system of first order 
partial differential equations, the “homological equation” [13]: 



>(o) . 

J 0 ’ 






J 0 



if 



(0) 
0 ’ 






Wi-I 



where F is a function of the previous orders, /q is taken as the average of F 
and [ • ; ■ ] stands for the Poisson bracket or the Lie operators, depending on the 
nature of the terms. The solvability of the homological equation can be assured 
if we assume that possesses several properties [13]. In our case, a suitable 
election of the canonical set of variables permits to reduce the solution of the 
homological equation to quadratures. 

Once we have found the generating function W, it is possible to obtain the 
direct and inverse transformations of the variables: 

Proposition 1. [9] The direct transformation x = fr2/o*^(y) given by 



(d 

y) = 



(i-i) 

y)+i 



0<k<j 



Ck+iyj_k\y), 



ds 

where C^s = 7— • W j 
oy 



with j/g = y, y^P = 0 (j > 0), and the inverse transformation y = ^x^\x) 



(d 



= d-d- E 

0<fc<j-l 



j - 1 
k 



Ck+ix^"lk-i(x), 



ds 

where C^s = ■ W,- 

ax 



with a;® = x, Xq^ = 0 (i >0). 

An important property of the Lie-Deprit method is that applied to Hamilto- 
nian systems it generates a canonical transformation [7]. Therefore, as a conse- 
quence, we have that the modified method (Theorem 1) also generates a canon- 
ical transformation applied only to the Hamiltonian part, and thus, the compo- 
sition of the perturbation theory and a symplectic numerical integration scheme 
will generate a symplectic seminumerical theory. The problem is that usually 
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the systems suitable for averaging (periodic standard form or angular standard 
form) are not separable Hamiltonians and, then, the symplectic integrators are 
implicit. Besides, we are interested in the integration of differential systems with 
Hamiltonian and non-Hamiltonian perturbations. Thus, in this paper we will not 
consider symplectic integrators; instead, we use a particular family of symmetric 
methods that also have interesting qualitative properties. 

3 Collocation Method 

In this section we formulate a collocation method for the solution of the averaged 
system 



The formulation here presented consists of calculating, on each integration 
step, an approximation of the solution by means of the interpolation polynomial 
at the extrema of a Chebyshev polynomial of the first kind. Thus, the solution 
is given by means of the coefficients of this collocation polynomial. 

This formulation [4] follows the idea, used by Clenshaw, of approximating 
the second member of the differential equation on each integration step [tk^ tfc+i] 
at the initial conditions by means of a finite series of Chebyshev polynomials 
of the first kind {Ti(u)|, that is to say. 



where u is given by the map u = {{t — tk) — {tk+i — t))/{tk+i ~ tk), in order to 
use the standard interval [—1, 1]. The prime in the sum symbol means that the 
first term in the series must be halved. 

An approximation of the Fourier-Chebyshev coefficients are obtained by 
means of numerical calculation of the quadratures. In our case the coefficients 
are computed with the Gauss-Lobatto formula 



means that the first and last terms must be halved. 

Once the second member of the differential system is approximated, we in- 
tegrate the series to obtain an approximation of the solution 





3>0 



n—1 




with — 1 < u < 1, 



(7) 




n—1 



(8) 



where = cos(* tt/ ( n—1)) are the extrema of T„_i(n) and the double prime 




2 



(9) 
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where the coefhcients are obtained by using the recursive formulas for the 
integration of the Chebyshev polynomials 

ii+l ii Cn— 1 ii+1 ^n—2 

“ 2 “ 2 2(n- 1)’ 

Or = ^ {Cr-I - Cr+i), for 1 < r < n - 2. 

2 2r 

The first coefficient Oq is calculated by using the initial conditions Uq of the 
problem on the integration step [tk,tk+i] through 

i ao = Qi - Q2 + as - + a„ + Vq. (11) 

Note that in Eq. (8), the values of the solution y{t) are required. However, the 
function y(t) is unknown, hence, the method is implicit. Therefore, an iterative 
method is needed, as well as a good initial estimation of the solution to begin 
with. Besides, since the collocation methods are based on approximations of the 
right hand member of Eq. (6), the lesser variations of it, the better would be the 
convergence. In our problem, as the differential system is the averaged one, the 
variations of the second member of the differential system are very small and, 
therefore, a very low number of iterations (1 or 2) is needed. Besides, we can 
take very big stepsizes with a low number of terms in the series. 

These methods (ChRK) have several properties, among them, it is interest- 
ing to remark that they are Runge-Kutta collocation methods, are A-stable [3], 
generate P-stable indirect Rimge-Kutta-Nystrpm collocation methods for spe- 
cial second-order initial- value problems and exhibit linear growth in time of the 
global error for time-reversible systems due to their symmetric structure. Other 
interesting features are that they can be easily formulated using variable step- 
sizes and in a matrix form suitable for parallel implementation. 

4 Seminumerical Integration Scheme 

The combination of the analytical theories (to obtain the averaged system and 
the averaged initial conditions and to recover the osculating elements from the 
averaged ones) and the RK collocation method with a Chebyshev series as output 
(ChRK), will give us a seminumerical method that computes in a fast way (for 
low precision) an “analytical” solution of the differential system. 



Seminumerical Integration Scheme 

Step 1: Determination of the averaged system: / — > /* (Theorem 1). 
Step 2 : Numerical integration 

2-i: Averaged initial conditions: Xq — > yg (Proposition 1) 

n 

2-ii: Numerical integration (ChRK): y(t) ~ 

i—0 

Step 3: Recovering of osculating elements: y(t) — *■ x(t) (Proposition 1). 
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5 Application to the Artificial Satellite Problem 



The scheme presented in this paper has been applied to the artificial satellite 
problem, modeled as the two body equations perturbed by the J2 term of the 
Earth potential and by the air-drag (only for low satellites). The analytical 
transformations are based on Theorem 1 and they have been taken from [2] . 

One of the first things to do is the selection of the canonical set of variables. 
Here, as in [6], we use the Delaunay variables {£, 5, h, L, G, iJ}. As a consequence, 
the homological equation (for the vector components) that must be solved by 
computing some quadratures with respect to the mean anomaly i, is 



3a~^{Wj)4 + '^ 



^ di 

djW,)4 

di 




for 2 < i < 6, 

( 12 ) 



where n = [ij o?, a is the semimajor axis of the orbit, /q is obtained in the 
precedent steps and is chosen according to the simplification (the averaging), 
that is, removing the mean anomaly (fast angle variable): 







'.U) , 



— / fo ii,g,KL,G,H)di. 

2 7T In 



5.1 Numerical Tests 

In the numerical tests we have applied the seminumerical scheme to a low altitude 
satellite that we call Low and a geostationary type satellite that we call Geo. 
The analytical theory used in the simulations has 145 terms in the averaged 
equations up to second order in the small parameter e: ~ J2, and the generator 
936 terms. The direct and inverse transformations have 1123 terms. Let us note 



Table 1. Integration time (T), number of revolutions (NR), number of steps 
(NS), number of function evaluations (NF) and CPU time in seconds using the 
ChRK and DOP853 [8] in the seminumerical integration of the Low and Geo 
orbits and in the numerical integration of the non-averaged equations (DOP853*) 





T 


NR 


DOP853 
NS/NF GPU 


GhRK 

NS/NF GPU 


DOP853* 

NS/NF 


CPU 


Low 


30 days 


467 


12/182 


0.78 


5/112 


0.02 


6,698/94,188 


7.69 


Geo 


1 year 


365 


12/182 


1.60 


5/98 


0.02 


1,510/21,028 


2.24 


Geo 


100 years 36,500 


17/257 


1.74 


8/140 


0.09 


152,691/2,118,395 


106.98 
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Fig. 1. Figures on the top: evolution of the osculating and mean values for the 
semimajor axis a (in Km.) and the eccentricity e for 30 days. Figures on the 
bottom: error in the semimajor axis a (in Km.) and in the eccentricity e for 30 
days in a seminumerical integration of the Low orbit 



that for a complete analytical theory of an artificial satellite is usual to have 
thousands or, even, millions of terms [1]. 

In Figure 1 we show the evolution, in the osculating and averaged elements, 
of the semimajor axis and the eccentricity of the Low orbit calculated with the 
seminumerical scheme. In the Figure 2 we show the error depending on the initial 
conditions, that is, if we use as the initial conditions for the averaged system 
the osculating ones (as several averaging methods do) or the transformed mean 
initial conditions. From the figures it is clear the necessity of the transformation. 

Finally, in Table 1 we show the number of steps and the CPU time on a 
PC PII-333Mhz using the ChRK and a standard RK (DOP853 [8]). For the 
ChRK we have taken n = 7 and in all steps it was needed, in average, only 
two iterations to reach the tolerance level. Also, for comparison, we present the 
CPU time for the DOP853 with the non-averaged equations. All the tests have 
been done for a relative error of 10“^ in the variable stepsize implementations 
(we remark that we have not used reversible stepsize strategies, but due to the 
symmetric nature of the ChRK it will be desirable). From the table it is clear 
the difference between using averaged and non-averaged equations. Besides, we 
remark the good performance of the implicit ChRK compared with the explicit 
DOP853, due to the smoothness of the averaged equations and because for the 
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10 20 

time (days) 



Fig. 2. Errors in the semimajor axis (Km.) in a seminumerical integration 
scheme by using osculating ( xq ) and mean elements (^q) as initial conditions for 
the Low orbit 



ChRK the Step 3 (recovering of osculating elements) it is done after integration 
and only at the required points by means of the dense polynomial solution. 

As conclusions, we remark that the combined use of analytical theories and 
collocation integrators permits to obtain very fast integrators (seminumerical 
integration schemes) that also give us an “analytical” solution as a polynomial. 
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Abstract. In this paper we study the rounding errors in the parallel 
evaluation of a finite series of a general family of orthogonal polynomials. 

Both, the theoretical bounds and the numerical tests present an almost 
similar behavior between the sequential and the parallel algorithms. 

1 Introduction 

The evaluation of polynomials is one of the most common problems in scientific 
computing and, with the development of parallel computers, it is interesting to 
design parallel algorithms to evaluate polynomials. Recently general algorithms 
for the parallel evaluation of polynomials written as a finite series of general 
orthogonal polynomials have been proposed in [1,3]. 

Usually the parallel algorithms are more unstable than the sequential algo- 
rithms for the same problem, but, for particular triangular systems the parallel 
algorithms possess stability properties similar to those of the Gaussian elimi- 
nation [7]. Thus, an important task is the stability analysis of the new parallel 
algorithms. 

In this paper we analyse the stability of the parallel algorithms for the evalua- 
tion of polynomials given in [1,3]. The analysis shows that the parallel algorithms 
are almost as stable as their sequential counterparts for practical applications. 
Extensive numerical experiments applied to Jacobi polynomials series confirm 
the theoretical conclusions. 

The algorithms that we study evaluate finite series Pn{x) = Er=o ^*( 2 ^) of 
a family of orthogonal polynomials which satisfy the triple recurrence 

relation 

^o{x)=l, ^i{x) = ai{x), ,, 

^k{x) - ak{x)^k-i{x) - f3k<Pk-2{x) = 0, k>2, ^ 

with ak{x) a linear polynomial of x. 

* The first author is supported partially by the Spanish Ministry of Education and 
Science (Project ^ESP99-1074-C02-01) and by the Centre National d’Etudes Spa- 
tiales at Toulouse (Erance). The second author is supported partially by Grants 
MM-707/97 and 1-702/97 from the Bulgarian Ministry of Education and Science. 
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Let us assume the standard model of roundoff arithmetic with a guard 
digit [5]: 

R{x*y) =x*y(l + cr), |cr| < po, * € {+,-,x,/}, 

where po is the machine precision. Also we denote 7 „ := npo/{l — npo) = 
n po + 0 {pq). By tilde we denote computed results in the following. 

2 Parallel Algorithms 

In [1] there were presented four parallel algorithms to evaluate a series of a gen- 
eral family of orthogonal polynomials. Two algorithms, PC and PF, are based on 
parallel methods applied to the matrix formulation of the sequential algorithms 
of Clenshaw and Forsythe. The other two algorithms, MPC and MPF, are based 
on a matrix product formulation of the sequential recurrences of the sequential 
algorithms. 

Let us briefly describe the PC algorithm in a form suitable for the stability 
analysis [4] . It is based on the sequential Clenshaw algorithm that can be written 
as the solution of a tridiagonal triangular linear system Sq = c where S is given 
below and c is the vector of coefficients of the polynomial series. In the following 
we suppose that n = kp, with p the number of processors. 

For the purpose of the stability analysis of the parallel algorithms the entries 
of S are rearranged in order to make the analysis easier. Before the permutation 
matrix S is as follows: 



S = 



( Ai B\ 

Cl 

A2 B2 
C2 D2 



Ap Bp 



V Cpj 

where Ai € and Bi € ]R 2 x(fc- 2 ) following structure. 



^<=(J ( 



_ ( ~/3(i-l)fe-|-3 0 0 . . . 0 

l)fc+3 0 . . . 0 



c, e jR(fe-2)x(fe-2) is upper triangular tridiagonal as the original matrix S', 
and Di € looks as follows: 



Q = 



/ 1 — a(i_i)fe+3 — /3(j_i)fc+4 
f ^(i— l)fc+4 



1 



\ 

0ik 

Oiik 

1 / 



/ 0 0 \ 

0 0 

l3ik-\-l 0 

\—0^ik+l ~Pik+2 / 



V 
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The coefficients {ai} and {Pi} are the coefficients of the triple recurrence re- 
lation (1) that defines the particular family of orthogonal polynomials that we 
use. 

In the parallel algorithm we permute the rows and columns of S in such a 
way that the permuted matrix is as follows: 



S = 




where C = diagjCi, . . . , Cp}, A = diag{Ai, . . . , Ap}, B = diag{i?i, . . . , Bp}, and 

/Oi^i \ 



D = 



V 



■ • Dp-i 

0 / 



By using the introduced block structure the PC algorithm can be given in 
the following way: 



Step 1: Compute in parallel: S = LU , where 

J_(CQ\ .r_f^n-2p C-^D 

\BlL2p)' Vo A-BC-^D 

Step 2: Solve Ly = c. 

Step 3: Solve Uq = y. 

Step 4: X;"=o Cr^r{x) = P2 qp{k-2)+2 + qp{k-2) + l ^l{x) + Cq. 



In [3] two other parallel algorithms are proposed (ChPC and ChPF) which 
are suitable for the parallel evaluation of Chebyshev series. These algorithms 
are much more efficient than the general parallel algorithms proposed above 
because they are especially designed for the evaluation of a polynomial series of 
a particular family of orthogonal polynomials, i.e., Chebyshev polynomials. As 
above the ChPC algorithm can be formulated by using a block matrix notation. 
Let us have T e 



T = diag{Tp,Tp,...,Tp}, 



p times 



with Tp S IR 



kxk 



/l-2Tp{x) 1 



Tp = 



\ 



1 -2Tp{x) 1 

1 -2Tp{x) 

1 y 



(2) 



(3) 



V 
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Also we define the vectors ep+i, ^o:p € and g, c S IR"'*'^ given by Cp+i = 

(0, . . . , 0, 1)^, To:p = (To(a;), . . . , Tp{x))^ (the values of the Chebyshev polyno- 
mials) and c = {cl, cl,..., with = {a,Ci+p, Ci+ 2 p, ..., Ci+i^h-i)p) (the 

vector of polynomial coefficients). Besides, for the initialization process we need 
the matrix Ti G 



/I -2x 1 \ 



Ti = 



V 



1 -2x 1 
1 —X 
1 / 



(4) 



Thus the ChPC algorithm can be rewritten as: 



Step 1: Solve Ti = Cp+i. 

Step 2 : Solve in parallel T g = c. 

Step 3: X;r=o CrTr{x) = {q^k+i T^{x) 



Qik+2 Tp_2 (x) ) . 



3 Rounding Error Bonnds 



In [4] the accumulation of rounding error for the PC and PF algorithms is 
studied, and in [2] the ChPC and ChPF algorithms are analyzed in a similar 
way. Below we present some results that state relative forward error bounds for 
some of these algorithms. It is interesting to remark that in the following results 
we use the Skeel’s componentwise condition number: cond(M) = || |M“^| \M\ ||oo 
instead of ^J.{M) = ||M“^||oo ||Af||oo. 



Theorem 1. [4] The relative normwise forward error in the solution q of system 
Sq = c obtained by the PC algorithm is bounded as follows: 






< 718 cond(S') 



\\C-^D\\oo 
1 — 72 cond(C) 



This theorem is applied to the particular family of Gegenbauer polynomi- 
als [6]. It is found in [4] that the PC algorithm is almost insensible to the number 
of processor inside the interval (—1,1), while the rounding error of the parallel 
PC (near x ± 1) and PF (Vx G [—1, 1]) algorithms decreases when p grows. 



Theorem 2. [2] The relative normwise forward error in the solution q of system 
Tg = c obtained by the ChPC algorithm is bounded as follows: 



ll<^g||oo 

ll^loo 



< po ■ min 



|2 + 4(p+l)(p + 2), 2 + 



8(P+1) 1 
Vl — x^ J 



cond(Tp) 4- e>(po)- 



A detailed analysis [2] of this result tells us that the rounding errors are almost 
similar in sequential and in parallel but for a special set of points {cos(i 'x/p) \ i = 
1, . . . ,p — 1} the rounding error grows in the parallel algorithm. 
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3.1 Numerical Tests 

In the theoretical analysis the MPC and MPF algorithms are not studied. Be- 
sides, the PC and PF algorithms are only studied in detail [4] for the Gegen- 
bauer polynomials. Therefore, the goals of the present paper is to compare the 
behaviour between a Gegenbauer family and a Jacobi family of orthogonal poly- 
nomials and to study the MPC and MPF algorithms. 

We have tested the PC, PF, MPC and MPF algorithms in order to analyze 
the effects of rounding errors. In the simulations we have studied the algorithms 
with Jacobi polynomial series (pn(x) = J2i=o of degree n = 4096. For 

each type of series we have used two sets of coefficients: set SI of monotoni- 
cally decreasing coefficients (c^ = l/(* -b 1)^) and set S2 of random coefficients 
normally distributed with mean 0 and variance 1. All the tests are done on a 
workstation SuN ULTRASPARC 1 and the programs have been written in For- 
tran 77 in double precision with unit roundoff po — 2.2 x 10“^®. 



P=1 


50 
O 40 


p=32 

PC=: 


MPC MPF 


Forsythe 


£ 30 




/ ' PF 


Clenshaw 


2 20 

0) 

10 

0 
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Fig. 1. Error in the evaluation of Legendre series (set SI) on one processor and 
error ratio between the rounding error on p processors and on one processor 



In Figure I we show the rounding error ratio between parallel and sequential 
algorithms in the evaluation of a Legendre series. We can see that when the 
number of processors grows the performance of all the algorithms is similar. Only 
for the MPF algorithm the rounding errors decrease much slower than the PF 
one. In Figures 2, 3, 4 and 5 we compare the performance for two Jacobi series. 
In the examples with set SI the behavior is essentially the same as for Legendre 
series (that are members of the Gegenbauer family), while for the set S2 the 
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Fig. 2. Figure on the top: relative average rounding error in the evaluation of a 
series of Jacobi polynomials {a = -\/2/10 + l, P = v^/lO + l) with monotonically 
decreasing coefficients. The rest of the figures show the ratio between the average 
rounding error on p processors and on one processor 




-1 - 0.5 0 0.5 1 

X 




-1 - 0.5 0 0.5 1 

X 



Fig. 3. Error ratio between the average rounding error on p processors and on 
one processor in the evaluation of a series of Jacobi polynomials {a = v^/lO, /3 = 
■\/2/10) with monotonically decreasing coefficients 
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Fig. 4. Figure on the top: relative average rounding error in the evaluation of 
a series of Jacobi polynomials {a = v^/lO + 1, /3 = v^/lO + 1) with random 
coefficients. The rest of the figures show the ratio between the average rounding 
error on p processors and on one processor 



performance of any algorithm is similar. Besides, the growth of the parameters 
(a,/3) that define the family of Jacobi polynomials seems not to influence the 
performance. 

Finally, in Figure 6 we show the evolution of the rounding error ratio de- 
pending on the number of processors p in the evaluation of a Legendre series. 

From the figures we conclude that from the numerical tests the behaviour 
detected for the Gegenbauer polynomials in [4] can be extended to the Jacobi 
polynomial series and that the behaviour of the PC and MPC algorithms is 
similar. A more detail analysis is needed for the MPF algorithm. 
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Fig. 5. Error ratio between the average rounding error on p processors and on 
one processor in the evaluation of a series of Jacobi polynomials {a = v^/lO, f3 = 
>/2/10) with random coefficients 




Fig. 6. Error ratio between the rounding error in parallel and in sequential in 
the evaluation of Legendre series at the point x = 0. with the sets SI and S2 
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Abstract. A sequence of least squares problems of the form min^ 
— /i )||2 where G is an n x n positive definite diagonal weight 
matrix, and A an m x n (m < n) sparse matrix with some dense columns; 
has many applications in linear programming, electrical networks, ellip- 
tic boundary value problems, and structural analysis. We discuss a tech- 
nique for forming low-rank correction preconditioners for such problems. 
Finally we give numerical results to illustrate this technique. 

Keywords: Least squares. Conjugate gradients. Preconditioner, Dense 
columns. 



1 Introduction 

Consider a sequence of weighted least squares problems of the form 

min||A'^?/-h||G, (1) 

V 

where y G IR™, h € 3R", G G is a positive definite diagonal weight matrix, 

and A G < n) is a sparse matrix with some dense columns. The 

weight matrix G and the vector h vary from one computation step (in interior 
point algorithms, a computation step is equivalent to interior point iteration) to 
another while A is kept constant. Throughout this paper, we assume A to have 
full rank m. Define ||(.)||g = l|G'^/^(.)|| 2 . Then the solution of (1) is given by the 
normal equations 

AGA^y = V, (2) 

where v = AGh. Let A = [^ 1 ,^ 2 ] be the matrix whose columns have been 
reordered into two block matrices Ai and A 2 . After reordering of A, let f7i = 
{1, 2, . . . , r} and J72 = {r-|- 1, r-|- 2, . . . , n} be column indices of A corresponding 
to Ai and A 2 respectively. Let G be partitioned such that Gi and G 2 are block 
(square) submatrices corresponding to J\ and J 2 respectively. Then 

AGA'^ = AiGiA^ + A 2 G 2 AI . (3a) 

In this notation, (2) becomes 

{A\GiA^ + A 2 G 2 A^)y = v. 
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Let H G be positive definite and diagonal. Likewise, we partition H 

such that 

AHX^ = AiHiAl + A 2 H 2 AI. (3c) 

The matrices A\ and A 2 consist of the sparse columns in A and the dense columns 
in A respectively. The matrix AiJIiAj' is the sparse part of the coefficient matrix 
of the normal equations (or the sparse part of the preconditioner) from the 
previous computation step with a known factorization. 

Let Ai G Sf{"‘^’’(r < n) have full rank m. The main issue we want to address 
in this paper is given (3a) and (3c) how should we form an efficient low-rank 
correction preconditioner? 

1.1 Organization and Notation 

In Section 2, we construct preconditioners based on low-rank corrections. Sec- 
tion 3 is on numerical experiments and Section 4 on concluding remarks. 

Throughout this paper we use the following notation. The symbol min^ or 
maxi is for all i for which the argument is defined. For any matrix A, Aij is 
the element in the i-th row and j-th column, and Aj is the j-th column. The 
symbol 0 is used to denote the number zero, the zero vector, and the zero matrix. 
For any square matrix X with real eigenvalues, Xi{X) are the eigenvalues of X 
arranged in nondecreasing order, Amin(-^) and Amax(-^) denote the smallest and 
largest eigenvalues of X respectively; i.e., 

X^i^iX) = Xi(X) < X2(X) <■■■< Xm(^) = Xm,.(X). 

We denote the spectral condition number of X by k(X) where by definition 

k(X) = A„,ax(^)/A„,in(X). 

The letters L and D represent unit lower Cholesky factor and positive definite 
diagonal matrices respectively. 



2 The Preconditioner 

To attain rapid convergence for conjugate gradient type methods we require 
that either the spectral condition number of the preconditioned matrix be close 
to one in order for the error bound based on the Chebyshev polynomial to be 
small, or the preconditioned matrix have great clustering of eigenvalues. From 
the computational point of view, we require that the linear systems with the 
preconditioner as coefficient matrix be easier to solve, and the construction cost 
of the preconditioner be low. 
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2.1 The Class of Preconditioners 

For a given index set 



Qi C {j :j € Ji and Gjj ^ Hjj}, 
let the r X r diagonal matrices Di and Ki be given by 

lo ifjGJi\Qi 

and 



Ki = Hi + Di. 



(4) 



Let Ai G where qi = |Qi|, consist of all columns Aj such that j G Qi and 

let Di G be the diagonal matrix corresponding to the nonzero diagonal 

elements of Di. In this notation, 

AiKiA^ = Ai(^Hi D\)AA^ = A\H\AAy A\D\AAy , 

namely, AiKiA^ is a rank <ji -correction of AiHiAj . 

Given the index set 

Q2 Q {j ■■ j G J2}, 

let the {n — r) x (n — r) diagonal matrix K2 and the n x n diagonal matrix K 
be given by 



if j S ^2 

0 if J G J2 \ Q2 



and 



K = 



Ki 0 
0 K2 ’ 



(5) 



where Ki is defined in (4) and s = j — r. Let A2 G where (72 = IQ2I, 

consist of all columns Aj such that j G Q2 and let G2 G 3?'*^^® be the diagonal 
matrix corresponding to the nonzero diagonal elements of K2- In this notation, 

A2K2A'^ = A 2 G 2 AI. 



Thus the general class of preconditioners is given by 

AKA^ = {^A\H\AAy ~\ — \-AiDiA^'j A2G2A2 ■ ( 6 ) 

Let Q= Qi y Q2 and q = qi + q2- The elements in the class of preconditioners 
(6) are determined by the choice of the index setQ. 

In interior point methods for linear programming, low-rank correction pre- 
conditioners have been suggested (or discussed) [2, 3, 5, 6]. However, these papers 
do not discuss the case when A contains some dense columns. 
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In applications where the problem matrix A contains some dense columns, 
an effective low-rank correction preconditioner is of the form (6). In this paper 
we establish bounds on the spectral condition number of the preconditioned 
matrix (AKA'^)~^AGA^, where AKA"^ and AGAA^ are given by (6) and (3a) 
respectively, and suggest how to form K. 

Let LDL^ = AiHiAi be the given. Then the linear system with the pre- 
conditioner as coefficient matrix is of the form 

(^LDL^ AiDiA'^ A 2 G 2 A 2 ') z = V. (7) 

For the techniques on how to solve (7) efficiently, see for example [1]. 

2.2 Bounds on Eigenvalues of the Preconditioned Matrix 

Our interest is in bounding the spectral condition number of the preconditioned 
matrix. By the definition of the spectral condition number, this is equivalent to 
establishing a lower bound for the smallest eigenvalue and an upper bound for 
the largest eigenvalue of the preconditioned matrix. 

Theorem 1. :[1] Let G,H G be positive definite diagonal matrices. Let 

AGA^ and AHA^ be partitioned as in (3a) and (3c) respectively. Let K\ and K 
be defined in (4) and (5) respectively. Then 

(i)min|l, min 

Ai((AiKiAi)~^AiGiAi)<maxIl, max I 

( IJLjj 

and 

(u)min|l, min < 

1 iej7i\Si \ Hjj\ ] - 

\i{(AKA^)~^ AGA^) <raaxil, max •[^^11+'*/', 

( jeJGQ.1 \Hjj } ] 

where ip = SjGj 2 \C 2 □ 

2.3 Choosing the Index Set Q 

The idea is to choose the index set Q such that the upper bound on the 
spectral condition number of the preconditioned matrix {AKA^)~^AGA^ 
is minimized. In particular, we choose Q\ such that the upper bound on 
k{{AiKiA^)~^ A iG\A\) is minimized and Q 2 such that ip is minimized. This im- 
plies that Qi consists of indices j G J\ corresponding to the largest Gjj/Hjj > 1 
and or the smallest Gjj/Hjj < 1 such that K,{Kf^Gi) is minimized. Similarly, 
Q 2 consists of indices j G J 2 corresponding to the largest GjjHAjllf. Thus the 
required index set Q = Qi[J Q 2 . 

Baryamureeba, Steihaug, and Zhang [3] suggest to choose Q such that 
k{K~^G) (for K positive definite) is minimized. Wang and 0’Leary[5,6] suggest 
to choose Q to consist of indices j corresponding to largest values of \Gjj — Hjj\. 
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Table 1. Generated results for diagonal matrices G and H 



Indices j 


no. of indices 


Gn 


Hn 


Gjj/Hjj 


Gjj Hjj 




1 ,.. 


.,5 


5 


10'^ 


W~ 


lo^ 


1.00 X lO’i 




6 ,.. 


.,10 


5 


10® 


10^ 


10® 


9.99 X 10® 




11, 


..,20 


10 


10® 


10^ 


10^ 


9.90 X 10^ 




21, 


..,30 


10 


10-^ 


10-® 


10-1 


1.00 X 10- 


3 


31, 


..,40 


10 


10^ 


10^ 


1 


0 




41, 


..,44 


4 


10-^ 


10“® 


10^ 


1.00 X 10- 


1 


45, 


..,47 


3 


10® 


10^ 


lOi 


9.00 X 10^ 




48, 


..,51 


4 


10-® 


102 


10-® 


1.00 X 10^ 





Distribution of eigenvalues for afiro 




Logic of fhe size of eigenvaiues of (AHA^) Vga^ 

Fig. 1. For K we choose Q{q = 20) such that k{K ^G) is minimized. Then 
we set Kjj = 0 for all j S J 2 \ Q 2 - For V we choose Q to consist of indices 
corresponding to 20 largest \Gjj — Hjj\. Then we set Vjj = 0 for all j G \ Q 2 
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Distribution of eigenvalues for afiro 




Fig. 2. We choose Qi{qi = 17) such that ^Gi) is minimized, Q 2 = 

[45,46,47](rz2 = 3) 



Table 2. Generated results for diagonal matrices G and H 



Indices j 


no. of indices 


Gn 


Hn 


Gjj/Hjj 


Gjj Hjj 


1,.. 


,10 


10 


10'^ 


2 X 10^ 


5 


5.00 X 10^ 


11,. 


.,20 


10 


10^ 


lOi 


102 


9.90 X 10^ 


21,. 


.,30 


10 


10-^ 


10-3 


10-1 


1.00 X 10-3 


31,. 


.,40 


10 


10^ 


lOi 


1 


0 


41,. 


.,44 


4 


10-1 


10-5 


10^ 


1.00 X 10-1 


45,. 


.,47 


3 


10^ 


10^ 


lOi 


9.00 X 10^ 


48,. 


.,51 


4 


10-3 


102 


10-3 


1.00 X 10^ 



3 Numerical Testing 

We extract the matrix A from the netlib set [4] of linear programming problems. 
We use afiro (m = 27, n = 51) test problem in our numerical experiments. 
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Distribution of eigenvalues for afiro 






Logic of the size of eigenvaiues of (AHA^) Vga^ 



Fig. 3. For K we choose Q{q = 18) such that k{K ^G) is minimized. Then 
we set Kjj = 0 for all j S J 2 \ Q 2 - For V we choose Q to consist of indices 
corresponding to 18 largest \Gjj — Hjj\. Then we set Vjj = 0 for all j G J^ 2 \ Q 2 



In Figure 1 and 2 we assume that the columns corresponding to indices 
j = 45,46,47, and 51 are dense and G,H are given by Table 1. 

In Figure 3 and 4 we assume that the columns corresponding to indices 
j = 2, 7, 21, 24, 33, 45, 47, and 51 are dense and G, H are given by Table 2. 

The results in Figure 1, 2, 3, and 4 strongly support the technique by Barya- 
mureeba [1] (Theorem 1) for choosing the index set Q when A has some dense 
columns. Firstly, these results suggest that Qi should be chosen based on the 
magnitude of (so that k{K^"^Gi) is minimized) instead of largest 

|Gij^. — Hi-. \. Secondly, the results show that it is not necessary to include in Q 2 
indices j G J 2 corresponding to Gjj <C 1. Instead Q 2 should consist of indices 
j G J 2 corresponding to the largest GjjHAjlH values. Furthermore, the results 
in Figure 2 and 4 show that the sparse part AiGiAf is not necessarily a good 
preconditioner for AGA^ . 
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« 20 



Distribution of eigenvalues for afiro 





. 


. 


8 -6 -4 -2 0 2 4 6 8 

Logic of fhe size of eigenvalues of (AKA^)”Vga^ 




. . 1 . 1 


8 -6 -4 -2 0 2 4. 6 8 

Logic of the size of eigenvaiues of (A.|G.|A() 'AGA 


L 1 1 


L 


■■ J ■ ■ 1 



-4 -2 0 2 4. 

LogtO of fhe size of eigenvaiues of (A^H^A.|) Vga 



Fig. 4. We choose Qi{qi = 14) such that k{K^'^G\) is minimized, Q 2 = 
[2,7,45,47](92 = 4) 



4 Concluding Remarks 

We have given numerical results to show that the derived theoretical bounds 
on the eigenvalues of the preconditioned matrix (Theorem 1) are actually good 
bounds. Lastly, we believe that these results strongly support the technique by 
Baryamureeba [1] (Theorem 1) for solving large-scale linear programs, and the 
technique merits further study. 
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Abstract. A subspace linesearch strategy for the globalization of 
Newton-GMRES method is proposed. The main feature of our proposal 
is the simple and inexpensive way we determine descent directions in 
the low-dimensional subspaces generated by GMRES. Global and local 
quadratic convergence is established under standard assumptions. 



1 Introduction 

We consider the problem of solving large-scale systems of equations 

F{x) = 0, (1) 

where F is a, nonlinear function from IR" to IR" and propose a new iterative 
process in the context of the inexact methods [4]. Specifically, we consider the 
relevant framework of the Newton-Krylov methods. Well known convergence 
properties of these methods motivated the recent works to create robust and 
locally fast algorithms and to develop reliable and efficient software ([2,8,9]). 
The basic idea of a Newton-Krylov method is to construct a sequence of iterates 
{xfe} such that, at each iteration k, the correction Sk = Xk+i ~ is taken from 
a subspace of small dimension and satisfies 

F'{xk)sk = -F{xk) +Tk, \\rk\\ < fjk\\F{xk)\\, (2) 

where F' is the system Jacobian, fjk is a suitable scalar in [0, 1) called forcing 
term and Xk is commonly referred to as the residual vector. 

Globally convergent modifications of Newton-Krylov methods are commonly 
called hybrid methods [2]. They are obtained using globally convergent strategies 
for the unconstrained minimization problem 

(3) 

icgIK. 

where / is an appropriately chosen merit function whose global minimum is a 
zero of F. 

* This work was partially supported by Murst Gofin98 “Metodologie numeriche avan- 
zate per il calcolo scientifico” , GNR “Progetto coordinate sistemi di calcolo di grandi 
dimensioni e calcolo parallelo”, Rome, Italy 
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Hybrid methods based on a linesearch backtracking strategy are often used. 
At the fc-th step of these methods, a direction p in R” is chosen and the new 
iterate Xk+i has the form Xk+i = Xk + Op where 0 € (0, 1] is such that f{xk+i) < 
f{xk)- The existence of such a 0 is ensured if p is a descent direction for / at Xk, 
i.e. V f{xk)^p < 0 ([5]). Classical choices of / are / = /i or / = /2 where /i = 
||F ||2 and /2 = Ill'll f/2. In both cases, a vector p satisfying \\F' {xk)p+F{xk )\\2 < 
||T’(a^fc )||2 is ensured to be a descent direction for / at Xk [3]. Using this result, 
several authors proposed globally convergent modifications of the basic Newton- 
Krylov method where backtracking linesearch procedures along the inexact step 
Sk are performed (see e.g. [2,3,6,8,9]). 

Here we are concerned with global strategies that search for a decrease of the 
merit function in a low-dimensional subspace generated by the Krylov method. 
This strategy can be stated as follows: letting S' be a subspace which contains 
Sk and is generated by the Krylov method, find a vector Ak & S such that 

Vf{xk)'^Ak < 0, and f{xk + Ak) < f{xk). 

Namely, we formulate a global strategy using the subspace approach that re- 
vealed to be a promising way for solving large-scale nonlinear systems ([1,2, 3, 7]). 
The particular Krylov method we consider is the well known iterative process 
GMRES [10]. 

In a Newton-GMRES context we propose an iterative method where a line- 
search procedure along the inexact Newton step Sk is combined with an alter- 
native linesearch strategy in a low-dimensional subspace S. The given approach 
is related to a curvilinear linesearch globalization procedure recently proposed 
in [1]. Specifically, first we use the direction of the inexact Newton step Sk- If 
it does not work well, i.e. relatively few steps do not suffice to decrease the 
value of /, we fall back on a step obtained by a slower method. The direction 
used in the latter method is a descent direction which is selected among the 
coordinate vectors and the steepest descent direction of the merit function in S. 
A key feature of our proposal is the simple and inexpensive way to determine 
such alternative direction. In fact, we use only information that are built in the 
progress of GMRES iterations. 

The given theoretical analysis shows that our method is globally convergent 
and close to the solution it reduces to Newton-GMRES method. Therefore, it 
retains fast local convergence rate of the Inexact Newton methods. Moreover, 
the proposed strategy is consistent with preconditioning techniques, too. 

Through the paper, for any vector (matrix) || • || denotes the 2-norm of the 
specified vector (matrix) and 6j the j-th unit coordinate vector, with its dimen- 
sion inferred from the context. The symbol (x)i represents the i-th component of 
a vector x. The condition number of a real matrix A is denoted by 7^2 (A). Fur- 
ther, the gradient vector of a given smooth real function / : R” i-^ R, is denoted 
by V/(x). The closed ball with center y and radius 6 is indicated by Ns{y). 
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2 Descent Directions in a Snbspace 

We consider an iterative process where, at the fc-th iteration, the linear system 

F\xk)s = -F{xk), (4) 

is approximately solved by GMRES as specified in (2). 

Let G R" be the initial guess for the true solution of (4) and = 
—F'{xk)s*l — F{xk) the initial residual. GMRES generates a sequence {s™} of 
vectors such that each s™ is the solution to 

min ||F'(a;fe)s + E(xfc)||, (5) 

where is the Krylov subspace 

= span{rlF\x,)rl {F\x,)rrl . . . , V°}. 

In order to solve (5), Arnold! process is used to construct an orthonormal 
basis ui, V 2 , . . . , Vm of Km where vi = 'r°/||r°||. Using this process, the matri- 
ces Vm = [vuV 2 ,.--,Vm] G R”^’" and Vm +1 = [vi, V 2 , . ■ ■ , Vm+l] G R-x(™+D 
verify the relevant relation 

F\Xk)Vm = Vm+lHm, (6) 

where Hm G jg upper Hessenberg matrix. Further, (5) reduces to 

nun llpfcei - (7) 

ysR 

where pk = |k°||- In theory GMRES iterates are computed until the current vec- 
tor s™ satisfies (2). Then, Sk is set equal to s™ and the new iterate Xk+i = Xk + Sk 
is formed. However, for large problems and large values of m, storage require- 
ments for the basis of Km may become prohibitive. This problem is overcome 
by a restarted GMRES, i.e. a process that uses GMRES iteratively and restarts 
it after a fixed number of iterations using the last computed iterate [10]. 

Now we turn our attention to the way one may select descent directions 
which are alternative to Sk and belong to a low-dimensional subspace generated 
by GMRES. We consider the case / = / 2 , but we point out that, if / = /i, 
analogous conclusions can be drawn. 

Following [2] , we restrict our search direction from Xk to be in the subspace 
Sg = span{vi,V 2 , ■ ■ ■ ,Vm, si}. 

Glearly, Sk G Sg and if G Km then Sg coincides with Km- 

Let W = [wi,W 2 , ■ . • , Wm+i] be the orthonormal basis of Sg such that Wi = Vi 
for i = 1, 2, . . . , TO and Wm+i is computed by Gram-Schmidt method. Thus, the 
global strategy assumes the form of the low-dimensional minimization problem 

min , g{y) = f2{xk + Wy), 

yeR'"+^ 
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with g : ^ R. Since F'{xk)W = \Vm+iHrmF' {xk)wm+i\^ the vector 

V( 7 ( 0 ) is given by 

V»(0) = (8) 

Now, we show that it is possible to exploit information gathered by GMRES 
in order to search descent directions alternative to Sk in Sg- In fact, we have the 
steepest descent direction dm = — Vg(0) for g{y) at j/ = 0. Further, given the 
coordinate vectors ej G j = 1, . . . , m + 1, we have Wj = Wcj and 

Vf 2 (xkfwj = VgiOfej = {Vg{0)),. (9) 

Hence, we can conclude that if {'^g{0))j < 0, wj is a descent direction for /2 
at Xk- An interesting additional observation is that if s° = 0, Sq coincides 
with Km- Then, W = Vm, and 

M = r°k/\K\\ = -F{xk)/pk, 

Vg(0) = {F'{xk)VmVF{xk) = {Vm+iHmV F{xk) = -PkHlei. 

Therefore, at no additional cost, the elements h-i j-, j = 1, . . . , m, of the first row 
of Flm, give us the directional derivatives of /2 at Xk along vj. 

We remark that Krylov methods have the virtue of requiring no matrix eval- 
uation. In fact, the action of the Jacobian F' on a vector v can be approximated 
by means of finite differences ([3,8]). An attractive feature of our global strategy 
restricted to Sg is that (8) does not require the Jacobian matrix explicitly and 
need only to compute the product F' {xk)wm+i- 

Moreover, following [1], it can be shown that, if a right preconditioner is 
used, the global strategy is performed in the subspace 

= span{vi,V2m ■ ■ ,Vm, si} , 

where V\,V 2 , ■ ■ ■ ^ is an orthonormal basis of the Krylov subspace 

ATP = {r°, {K{xk)Pr)rl {F\xk)Pr?ri , . . . , V°}. 



3 The New Method 

In this section we present a new globally convergent hybrid method which com- 
bines two linesearch backtracking strategies. 

At each iteration the Inexact Linesearch Backtracking (ILB) strategy given 
by Eisenstat and Walker in [6] is tried first. If within a maximum number Nt of 
backtracks no progress is found in the merit function /i, we leave the direction 
Sfc and apply a new globalization process, called CD (Coordinate Directions) 
strategy. This alternative global method searches for a reduction in the merit 
function /2 along a properly selected vector of the subspace Sg and uses the 
following quadratic model for f2{xk + Wy) at Xk'- 

m = \\F\xk)Wy + F{xk)\\V2. 
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with g : ^ M. 

The resulting fc-th iteration can be sketched as follows. 

Algorithm 

Given Xk, rjmax G (0,1), fjt G [0,r]max], a, f3 & (0, 1), 0 < < 6»m < 1, IVb > 0. 

1. Apply GMRES to compute Sk such that ||F(a;fe) + F'{xk)sk\\ < fjk\\F{xk)\\- 

2. Perform the ILB-strategy: 

2.1 Set Sk = Sk, T]k = fjk, nb = 0. 

2.2 While fi{xk + Sk) > (1 - a{l - rjk))fi{xk) k nb < Nt do: 

Ghoose 9 £ [9 

m : 9m] 

Update Sk = 9sk, rjk = ^ — 9{1 — rjk) and nb = nb + 1. 

3. If 

fi{xk + Sk) < (1 - a(l - ’nk))fiixk), (10) 

Set Ak = Sk- Go to step 5. 

4. Perform the GD-strategy: 

4.1 Gompute Vg(0). Set dm = — V(?(0). 

4.2 Let j* be such that |(V5(0))j. | = maxi<j<m+i l(V5(0))j|. 

4.3 If (Vg(0))j. > 0 

Set u = —Wj*, e = —ej* 

Else 

Set u = Wj* , e = €j* . 

4.4 Gompute Oe = argming(o;e), = argmin5(o;(im)- 

4.5 If /2(xfc + adWdm) < f2(xk + aeWe) 

Set u = adWdm, V/ 2 (xfc)^u = -ad||dm|P- 

Else 

Set u = aelUe, V/ 2 (xfc)^u = g{0). 

4.6 If 



f2{xk + u) < f2{xk) + a\7 f2{Xk)'^U, ( 11 ) 

^f2{Xk + u)'^U> PVf2{Xk)'^U ( 12 ) 

Set Ak = u. Go to step 5. 

4.7 Ghoose 9 G [9m, 9 m]- Update u = 9u- Go to step 4.6 

5. Set Xk+i = Xk + Ak- 



We remark that in the ILB-process we move along the direction Sk and 
we select successively shorter steps Sk of the form Sk = l^^Sk- In the GD- 
strategy, we select among ±wi, ... ,±Wm+i the vector wj* that produces the 
maximum local decrease of f 2 - Then, we form addm and agC, i.e. the minimizer 
of the quadratic model g{y) along dm and e respectively. The new direction u 
is chosen between Wdm and We on the base of the minimum between the two 
values f 2 {xk + adWdm) and f 2 {xk + agWe)- Finally, a backtracking linesearch 
along u is performed until the Goldstein-Armijo conditions (11) and (12) are 
met. Since u is a descent direction for /2 at Xk, there exists a point Xk+i such 
that (11) and (12) hold (see [5, Theorem 6.3.2]). 
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4 Convergence Results 



Now we will address the convergence behaviour of the method described in the 
previous section. We will make the following assumptions: 

— F : K" — > M" is continuously differentiable; 

— F' is Lipschitz continuous in L = {x € R" : f 2 {x) < f 2 {xo)}- 

— for each k, F'{xk) is nonsingular and \\F{xk) + F'{xk)sk\\ < rjmax\\F{xk)\\ 
holds. 



Note that from the first two assumptions it follows that V /2 is Lipschitz continu- 
ous in L. Furthermore, the last assumption avoids that the method breaks down. 
In fact, the {k + l)-th iterate can not be determined either if Xk is such that 
F'{xk) is singular or if there are no descent directions in Sg- On the other hand, 
if the step Sk provided by GMRES satisfies \\F{xk) + F' {xk)sk\\ < '>lmax\\F{xk)\\, 
then the existence of descent directions in Sg is guaranteed. Note that this is 
not a serious restriction when the null starting vector for GMRES is used, since 
Vmax can be taken arbitrarily near one. 

We will show that, if there exists a limit point x* of {xfc} such that F'{x*) 
is invertible, then F{x*) = 0 and Xk ^ x*. Further, for k sufficiently large, Xk+i 
has the form Xk+i = Xk + Sk] then the ultimate rate of convergence depends on 
the choice of the forcing terms fjk, as shown in [4]. 

In our analysis we will use the following two results that show the convergence 
behaviour of methods obeying (10) and (11)-(12), respectively. 



Theorem 1 ([6]). If {xk} is a sequence generated applying ILB-strategy, i.e. 
for each k (10) is satisfied, and x* is a limit point such that F'{x*) is invertible, 
then ||E(x)|| ^ 0. Further, let F = 2\\F'{x*)~^\\{\ + T]rnax)/{^-rimax) and <5 > 0 
sufficiently small that, ||F’'(x)“^|| < 2||F'(x*)“^|| whenever x G Ns{x*), and also 

\\F{y) - F{x) - F\x){y - a;)|| < ^^\\y - x||, (13) 



if X, y G N25 {x*). Then, if Xk G Ns{x*), Xk+i has the form Xk+i = Xk + Sk{r]k) 
with Skivk) = (1 - ?7fc)sfc/(l - fjk) and 



1 - r]k> minjl - r]k 






(14) 



Theorem 2 ([5],Th. 6.3.3). Let xo G R" be given and {xk} be a sequence such 
that for each k > 0, Ak = Xk+i — Xk satisfies (11) and (12) and W f{xk)^ Ak < 0. 

Then, either V f{xk) = 0 for some k or limt^oo ~ 

Now, main convergence results for the proposed hybrid method can be stated. 



Theorem 3. Assume that there exists a limit point x* such that F'{x*) is in- 
vertible. Then, the sequence {||E(a;fc)||} converges to zero. 
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Proof. Note that the sequence {||i^(xfc)||} is strictly decreasing and bounded 
from below by zero. Hence, it is convergent. Let {i^} be the subsequence such 
that {xk} X* . 

If there exists an index k such that, for k > k, Xk is computed by ILB-strategy, 
from Theorem 1 we get that ||T'(ifc)|| ^ 0 and therefore F{x*) = 0. 

Otherwise, let K be the set of indices such that, for k G K, Xk is computed 
by the CD-strategy and let {xi^} be the subsequence of {ifc} such that Ik G K. 
A direct adaptation of Theorem 2 yields 

\\AiJ 

Further, by construction, at step h-th CD-strategy gives 

= l(Vff(o)p I = 

if the direction Wj* is selected, and 



if the steepest descent direction in Sg is chosen. Then, since from [3, Corollary 
3.5] we have 



||M^^V/2(y)|| > 



1 "Hmax 

(It ^maa:)^2(A^(Xfc)) 



l|V/2(y)||, 



it follows 



|V/2(y)^AJ 

\\Alk\\ 



> 



1 1 - 
■^72 (1 -t“ Tjmax 



'^max 

)k2{F'{xk)) 



l|V/2(y)||. 



Due to the invertibility of F'{x*), for k sufficiently large k 2 {F'{xk)) can be 
bounded from zero, and as a consequence, ||F(ijj,)|| — s- 0 and ||T(a;/c)|| — *■ 0. 

Theorem 4. Assume that there exists a limit point x* such that Ffx*) is in- 
vertible. Then, there exists a sufficiently large k > 0 such that, for k > k, 
Xk+i = Xk + Sk. Further Xk — > x* . 



Proof. ^From Theorem 3 it follows ||F(a;fe)|| — *■ 0. Hence F{x*) = 0. Assume 
9m < 1/2. Set K = ||F'(a;*)“^||, F = 2K{1 r?max)/(l - Vmax), and let <5 
sufficiently small that, if x G Ng{x*), ||T'(x)“^|| < 2K and (13) holds whenever 
x,y G N 2 s(x*). 

Since x* is a limit point of {xfe} and F{x*) = 0 there exists a k sufficiently 
large that 

Xk e y = {?/|||y- j:*|| < ^,\\F{y)\\ < ^}, 

where e < <5 is such that 2K9m^/F < 5/2. Clearly, 9m5 / {F\\F{xk)\\) > 1, then 
from (14) it follows that the fc-iteration is successfully performed with pk = ijk-, 
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i.e. Xk+i = Xk + Sk- To complete the proof, we show that Xk+i € N^. To this 
end, first note that from [6, Th. 6.1] it follows 

lisfcll < T(l-77fc)||F(xfc)||. (15) 



Further, since 



lisfcll < T(1 - r]k)\\F{xk)\\ < r||J^(xfc)|| < OmS < 

we have 

\\xk+i - CC*|| < \\xk - X*|| + lisfell < S, 

which implies Xk+i G Ns{x*). Finally, from (13) the following relation can be 
derived 

||a;fe-x*|| <2iF||F(xfe)||. 

It yields 



||x,+i -x*|| < 2K\\F{xk+i)\\ < 2K\\F{xk)\\ < 

Thus, Xk+i S iVe and we can conclude that there exists a fc > 0 such that, 
for k > k, no backtracking is performed along Sk, Xk € and Xk ^ x*. 



References 

1. Bellavia S., Morini B.: A globally convergent Newton-GMRES subspace method 
for systems of nonlinear equations. Submitted for publication. 69, 71 

2. Brown P. N., Saad Y.: Hybrid Krylov Methods for nonlinear systems of equations. 
SIAM J. Sci. Stat. Gomput. 11 (1990) 450-481. 68, 69, 70 

3. Brown P. N., Saad Y.: Convergence Theory of Nonlinear Newton-Krylov algo- 
rithms. SIAM J. Optim. 4 (1994) 297-330. 69, 71, 74 

4. Dembo R. S., Eisenstat S. C., Steihaug T.: Inexact Newton Methods. SIAM J. 
Numer. Anal. 19 (1982) 400-408. 68, 73 

5. Dennis J. E., Schnabel R. B.: Numerical Methods for Unconstrained Optimization 
and Nonlinear Equations. Prentice Hall, Englewood Cliffs, NJ, 1983. 69, 72, 73 

6. Eisenstat S. C., Walker H. F.: Globally Convergent Inexact Newton Methods. 
SIAM J. Optim. 4 (1994) 393-422. 69, 71, 73, 75 

7. Feng D., Pulliam T. H.: Tensor-GMRES method for large systems of nonlinear 
equations. SIAM J. Optim. 7 (1997) 757-779. 69 

8. Kelley C. T.: Iterative Methods for Linear and Nonlinear Equations. SIAM, 
Philadelphia, 1995 68, 69, 71 

9. Pernice M., Walker H. F.: NITSOL: a new iterative solver for nonlinear systems. 
SIAM J. Sci Gomput. 19 (1998) 302-318. 68, 69 

10. Saad Y., Schultz M. H.: GMRES: a generalized minimal residual method for solving 
nonsymmetric linear systems. SIAM J. Sci. Stat. Gomput. 6 (1985) 856-869. 69, 
70 



Comparative Analysis of Marching Algorithms 
for Separable Elliptic Problems 



Gergana Bencheva 



Central Laboratory of Parallel Processing, Bulgarian Academy of Sciences 
Acad. G. Bontchev Str., B1.25A, 1113 Sofia, Bulgaria 
geryOcantor . bas . bg 



Abstract. Standard marching algorithms (MA) and generalized march- 
ing algorithms (GMA) for 2D separable second order elliptic problems on 
rectangular nxm grids are described. Their numerical stability and com- 
putational complexity are theoretically and experimentally compared. 
Results of numerical experiments performed to demonstrate the stabil- 
ity of GMA versus the instability of MA are presented. 

Keywords: fast elliptic solvers, marching algorithms, computational 
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1 Introduction 

After discretizing separable elliptic boundary value problems on rectangular do- 
mains, linear algebraic systems with special block banded structure are obtained. 
The fast elliptic solvers are highly efficient algorithms for their direct solution, 
and the so called marching algorithms is one class of them. 

The goal of this study is to review the theoretical and experimentally compare 
the numerical stability and computational complexity of the standard marching 
algorithm (MA) and the generalized marching algorithm (GMA) for 2D sep- 
arable elliptic problems discretized on a rectangular nxm grid. These two 
algorithms are first proposed in [2,3] and later reformulated in [5] by using the 
incomplete solution technique, which slightly reduces the asymptotical opera- 
tion count of the GMA. The standard marching algorithm is optimal in the 
sense that its computational cost depends linearly on the dimension of the sys- 
tem, namely, the number of arithmetic operations for its implementation is of 
order A/ma = 0{nm). Unfortunately MA is unstable and hence is of practical 
interest for sufficiently small-sized problems, or more generally for m <C n. The 
GMA is a stabilized version of the MA obtained (in [-5]) by limiting the size of 
the marching steps and using the incomplete solution technique for problems 
with sparse right-hand sides in the second part of the algorithm. The total cost 
of the resulting algorithm, in the case when m = n, n + 1 = p(k -|- 1), p, /c G Z, 
is of order Ngma = 0{n^ logp -I- n^). 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 76—84, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 



Comparative Analysis of Marching Algorithms 



77 



The remainder of the paper is organized as follows. At the beginning of 
Section 2 the considered problem is formulated and the technique for incomplete 
solution of problems with sparse right-hand sides is briefly outlined. Next, the 
algorithms MA and GMA are described in the same section. Results of numerical 
experiments that confirm the theoretical estimates are given in Section 3. 

At the end of the paper some concluding remarks about the applications of 
the presented algorithms as representatives of fast elliptic solvers are formulated. 



2 Description of the Algorithms 

In this section we present the standard marching algorithm (MA) and a sta- 
bilized version of it, the generalized marching algorithm (GMA), obtained by 
limiting the size of the marching steps combined with the so-called technique of 
incomplete solution of problems with sparse right-hand sides. The exposition in 
this section is based on the survey [1]. 

2.1 Formulation of the Problem and Preliminaries 

A separable second order elliptic equation of the form: 

“ X! ^ = f(x), X = (xi,X 2 ) € = (0, 1)^ 

s— 1 ^ V 

u = 0, on df2 

is discretized by finite differences or by piecewise linear finite elements on right- 
angled triangles. The following block banded system with tensor product matrix 
is obtained: 



Ax = f , 



(2) 



where 



A = B 0 /„ 
! T + bi^i In 6 i ,2 In 
&2,1 In T + 62,2 In 



)T 



0 ... ^m,m — 1 In T ^m,m In j 

X = (xi,X2, . . . ,X„)’^ , f = (fi,f2, . . . ,fm)^ , 

= {xij,X 2 J,---,Xnj) , fj = (/l,J j /2,j; • ■ • ; /n,j) ; 

Xj, ij e R", j = 1, . . . ,m. 



Here, is the identity n x n matrix and 0 is the Kronecker (or tensor) 
product of the matrices C and D, defined by Cmixni® Dmi-Kni = 
where C = (c.,,)™\;i„ 

The matrices T = and B = are tridiagonal, symmetric 

and positive definite, corresponding to a finite difference approximation of the 
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one-dimensional operators {—d/dxg) {as{xs) {d/dxs){-)), for s = 1,2, respec- 
tively. 

Here, we briefly describe the so-called incomplete solution technique applied 
to systems of the form (2) with a sparse right-hand side. That technique has 
independently been proposed by Banegas, Proskurowski and Kuznetsov. More 
details may be found in [1,4]. 

It is assumed that the right-hand side f has only d (d <C m) nonzero block 
components and for some reason only r (r m) block components of the so- 
lution are needed. Let for definiteness fj = 0 for j yf ji, j 2 , . . . ,jd- To find the 
needed components , Xj/ , . . . , xj^ of the solution, the well-known algorithm 
for separation of variables is applied taking advantage of the right-hand side 
sparsity: 

Algorithm SRHS 

Step 0. determine all the eigenvalues and d components of all the eigenvectors 
of the tridiagonal matrix B. 

Step 1. compute the Fourier coefficients (3i^k of f' from equations: 

d 

Pi,k = Qfc • f* = X] fo-F'/LA , * = 1, • ■ • , n, fc = 1, . . . , m. 

S — 1 

Step 2. solve m n x n tridiagonal systems of linear equations: 

(Afe In + T)r]k = Pk, fc = 1, . . . , TO . 

Step 3. recover r components of solution per lines using 

m 

Xj = X foAfofc for j = ) j'v 

Remark 1. Here, {q^-, Xk}^=i denote the eigenpairs of the tridiagonal matrix 
Bmxm, i-e., Bqk = Xk<ik, k = l,...,m. 

The computational complexity of Algorithm SRHS is given in: 

Proposition 21 The Algorithm SRHS requires TO,[2(r-|-d)n-|-(5n — 4)] arith- 
metic operations in the solution part, m[ndevisions 3(n — 1) other operations] 
to factor the tridiagonal matrices Xk In + T, k = 1,...,to in LD~^U form, 
and 0{dm^)-\-9m^ arithmetic operations for computing all the eigenvalues and d 
components of all the eigenvectors of the matrix B. 

2.2 Marching Algorithm (MA) 

We now describe the standard marching algorithm. The first block equation of 
the system (2) is placed at the bottom and the reordered system is rewritten in 
the following two- by- two block form: 




(3) 
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Here, U is upper triangular matrix and admits the following form: 
( b2,lln T + &2,2^n &2,3^n ■ • ■ 0 \ 



u = 



0 



^3,2-fri T + b^ ^In ■ . ■ 



3.3^n 



0 



\ 0 ... ... 0 bjn,m—lln j 

The remaining block matrices and vectors read as follows: 

G = (0, . . . , 0, bm-l,mlm T + bm,ralnf" , G = (T + bi^2ln, 0, • . • , 0) , 

x' = (xi, X2, . . . , X„_i)^ , f' = (f2, fa, . . . , ■ 

In order to find the solution of (3), one makes use of the following block 
factored form of the matrix, 

'IU~^G 



U 0 



X 



C-CU~^GJ\0 I 
This system is solved by successive solution of the following systems: 
t/yi = f' 

-GG-iGx„ = fi = fi-Gyi . 
t/x' = -Gxm, + f' 



(4) 



(5) 



The standard backward recurrence is employed to solve systems with U. I.e., 
consider C/^ = g, where ^ ^2’ ' ' ' ’ §= (gi, §2, ■ • ■ , Then, 

im-i = 

for i = m — 1 down to 2 



end . 



iz-i ~ bi.Li (s*-i ^4 ^*.*+i4+i) 



As readily seen, the computational cost is 0(nm) operations. 

The system with the Schur complement —CU~^G is equivalent to the in- 
complete solution of a system with the original matrix A and with a sparse 
right-hand side with only one non-zero block component; namely. 






0 



Vo; 



(6) 



The only block component which is needed is x^ = x^. This is seen by the 
following argument; one may apply the same reordering and block factorization 
to (6) in the same manner as to the original system (2). Since now f = 0 the 
resulting system (5) will equal —CU~^Gnm = fi- 

To solve the last system (6) Algorithm SRHS is used. Here d = 1, ji = 1 
and r = 1, = m. According to Proposition 21 this step of MA requires 0(nm) 

operations. 

For the computational complexity of MA we summarize: 
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Theorem 1. The marching algorithm in combination with the separation of 
variables technique for the incomplete solution of the reduced system requires 
an optimal cost of operations for solving problems with separable variables (2); 
namely, the cost is Oinm) operations. 



2.3 The Generalized Marching Algorithm (GMA) 

In [2,3] it is demonstrated that the recurrence used for solving systems with 
the upper triangular matrix U is unstable and hence the marching algorithm is 
unstable for large m. This makes the marching algorithm of practical interest 
only if the length of this recurrence is small, i.e., for m <C n. 

When m = n or of same order, to solve the problem (2) one may use the 
generalized marching algorithm (GMA) which we now describe. 

For ease of presentation, let m = n, and n + 1 = p{k + 1) for some integers p 
and k. Consider now the following reordering of the unknown vector x: all rows 
of X of multiplicity k + 1 form its second block component x*^^\ 

This reordering of x induces the following symmetric block odd-^even reorder- 
ing of the original matrix A; namely, one gets a reordered form A of A, where 
A = Note, that the first block Ai^i is block-diagonal, since the block 

x^^) is a separator; it partitions the n x n grid into p strips with k grid lines. 
More specifically, the following block form of A is obtained: 






A = 



0 : 



: ^1.2 



0 



.(fc) : 

Alp 



V 



^ 2,1 ^ 2 , 2 / 

where corresponding blocks are defined by: 



Y^2,l ^2.2 J 



All = blockdiag{Ai^'^)^^j^, 

Ai^'^ = Jfc®T+S?^(g)/„, 

= tridiag{bk,+i,k,+i-i ,bk^+i^k,+i ,bk„+i,k,+i+i)i=i , 

A 2,2 = blockdiag{T + bk„+^,k,+i In), ks = {s - l)(fc-l- 1), s = 1, . . . ,p . 



The components of the solution vector x and the right-hand side vector f are 
grouped as follows: 



( 1 )' 



) , = 


/x«^ 


x(i) - 




/ 






\X(s_i)(fc+i)+fe/ 


) , = 




f(i) — 


'^f(s-l)(fc-K)-|-l \ 


/ 






\^I(s-l)(k+l) + k J 



, s = l,...,p. 



, s = l,...,p. 
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= (Xfc+l, . . . ,Xs(fc+i), . . . ,X(p_i)(fc+i))^, 
^ (ffc+1 ; ■ • ■ ; fs(fc+l) ; • ■ • ; l)(fc+l) ) • 



The reordered matrix A allows block factorization and the problem now has the 
form: 



( ^1.1 0 ] 
V^2,l ^ J 



I \ ^ 1,2 \ 

o’/ 





where S = A 2.2 — 2 l 2 ,i A^ \ Ai ^2 is the Schur complement of A. Two systems with 
the block-diagonal matrix Ai 1 and one with the Schur complement 



(I 2.2 - A 2.1 Iri ^i,2)x( 2) = f (2) = f (2) _ a2,i Iri f (7) 

AiaxW =f(i)-Ai, 2 x( 2 ) 

have to be solved to compute the solution of the original system. 

The systems with Ai.i are solved by applying p times the standard march- 
ing algorithm for the subproblems with Ai'^K This procedure requires 0{npk) 
arithmetic operations. 

The length of the recurrence needed for solving systems with the upper tri- 
angular blocks is fc — 1 = — 2, and can be controlled by choosing sufficiently 

large p. 

The system with the Schur complement is equivalent to incomplete solution 
of a system with the original matrix and with a sparse right-hand side; namely 
the system 

Ax = f, where i * = s{k + 1) 

\0, i^s{k + l) 

have to be solved incompletely seeking only Xs(^._|_i) = ^s{k+i)^ s = 1, . . . ,p — 1. 
The last problem is handled by the fast direct solver called FASV and proposed 
in [6] (detailed description of FASV may be found in [4]). 

Let for definiteness, p = 2K Since we have to perform I steps of the algorithm 
FASV, the cost for the incomplete solution of (8) by algorithm FASV is given 
by the following proposition: 

Theorem 2. The second step of the block- Gaussian elimination based on the 
incomplete solution of problem (8) using algorithm FASV requires 2An^l — 9n^ 
operations. 



Remark 2. The generalized marching algorithm in the form presented in Bank [2] 
requires in the second step of the block-Gaussian elimination 28n‘^l operations. 
This shows that the algorithm proposed in [5] and presented here has asymptot- 
ically a slightly smaller operation count. 



Summarizing, one may see that the implementation of the generalized marching 
algorithm requires a total cost of 

0{npk) -\- 2An^l — 9n^ = 0{n^) -\- 2An^ log 2 (p) — 9n^ 



arithmetic operations. 
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3 Numerical Experiments 

To experimentally compare the numerical stability of the described algorithms, 
some numerical tests have been performed (using the HP9000/C110 computer) 
for the following test problem. 

Example: Application to the case ai(xi) = a 2 {x 2 ) = 1- he., the two dimen- 
sional Poisson equation with Dirichlet boundary conditions is considered: 

—Au{xi,X 2 ) = f{x), X G 17 = (0, 1) X (0, 1) 
u = 0, on df2 

The solution is u{x\, X 2 ) = sin(Trxi) sin(7rx2) , which implies that the right-hand 
side is f{xi,X 2 ) = 27t^ sin(7rxi) sin(7rx2). 

This problem is discretized using the five point finite difference scheme on 
uniform n x n (m = n, /ii = /12 = h) mesh with mesh parameter h = l/{n+ 1). 

The discrete problem is solved using MA and GMA and the results are col- 
lected in separate tables (Table 1 and Table 2) for each algorithm, respectively. 
The first column of both tables shows the mesh-size. In the remaining two 



Table 1. Results for algorithm MA 



n 


/ 2 -error 


Max. error 


3 


2.651e-02 


5.303e-02 


7 


6.475e-03 


1.295e-02 


15 


1.609e-03 


3.219e-03 


31 


6.086e-t08 


5.611e-t09 



columns of Table 1, the l 2 ~ and C-norms of the vector of the pointwise error of 
the solution computed by MA are given. It is clearly seen from this table, that 
for this example MA is stable if m < 15. 

The columns of Table 2 are in groups of 3 with similar data as in Table 1. Here 
we vary the step-size fc — 1 of the recursion in the GMA. First group contains 



Table 2. Results for algorithm GMA (n = p{k -|- 1) — 1) 





II 

CO 


fc = 15 


n 


p 


/ 2 -error 


Max. error 


P 


/ 2 -error 


Max. error 


7 


2 


6.475e-03 


1.295e-02 








15 


4 


1.609e-03 


3.219e-03 








31 


8 


4.018e-04 


8.036e-04 


2 


6.950e-04 


5.653e-03 


63 


16 


1.004e-04 


2.008e-04 


4 


4.601e-04 


6.329e-03 


127 


32 


2.510e-05 


5.020e-05 


8 


1.064e-04 


1.900e-03 


255 


64 


6.275e-06 


1.255e-05 


16 


2.558e-05 


5.469e-04 


511 


128 


1.569e-06 


3.137e-06 


32 


6.397e-06 


1.369e-04 


1023 


256 


3.922e-07 


7.844e-07 


64 


1.798e-06 


4.371e-05 
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the results obtained by GMA for fc = 3. At any refinement step both I 2 - and C- 
norms of the error in this case decrease 4 times, which means that the algorithm 
is stable. The results were similar for the case k = 7 (not shown in the table). 
The second group of columns in Table 2 shows that in the case fc = 15 the 
stability of GMA is affected, but still the results are acceptable. That is, in 
practice depending upon the specific floating point arithmetic one has to choose 
the value of the parameter fc very carefully, i.e., it is machine dependent. 

4 Concluding Remarks 

As demonstrated by the complexity estimates and numerical experiments, the 
advantage of GMA is clearly seen. In practice, apart from solving separable el- 
liptic problems in a single rectangular domain, the development of fast elliptic 
solvers is strongly motivated by their potential application to the construction 
of efficient preconditioners for iterative solution of more general problems on 
more general domains and meshes, such as: problems with slowly varying coef- 
ficients and/or jumping coefficients corresponding to the case of multi-layer me- 
dia; composite domains, e.g., L-shaped or T-shaped domains; as well as block- 
preconditioning of coupled systems of partial differential equations including 
elasticity and Stokes problems. In particular, a further implementation of here 
considered marching algorithms together with algorithms analyzed in [4] into the 
framework of domain decomposition and domain embedding methods is of in- 
terest in order to handle more general elliptic problems on more realistic general 
domains and meshes. 
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Abstract. In this paper we present the results obtained in the solution 
of sparse and large systems of nonlinear equations by Inexact Newton-like 
methods [6]. The linearized systems are solved with two preconditioners 
particularly suited for parallel computation. We report the results for the 
solution of some nonlinear problems on the CRAY T3E under the MPI 
environment. Our methods may be used to solve more general problems. 
Due to the presence of a logarithmic penalty, the interior point solu- 
tion [10] of a nonlinear mixed complementary problem [7] can indeed be 
viewed as a variant of an Inexact Newton method applied to a particular 
system of nonlinear equations. We have applied this inexact interior point 
algorithm for the solution of some nonlinear complementary problems. 
We provide numerical results in both sequential and parallel implemen- 
tations. 



1 The Inexact Newton-Cimmino Method 

Consider the system on nonlinear equations 

G(a:)=0 G=(5 i,...,5„)^ (1) 

where G : i?" ^ i?" is a nonlinear G^ function, and its Jacobian matrix J{x). 
For solving (1) we use an iterative procedure which combines a Newton and 
a Quasi-Newton method with a row-projection (or row-action) linear solver of 
Cimmino type [11], particularly suited for parallel computation. Here below, 
referring to block Cimmino method, we give the general lines of this procedure. 

Let As = 6 be the linearized system to be solved. Let us partition A into p 
row-blocks: Aj, z = 1, . . . ,p, i.e. = [Ai, A 2 , . . . , Ap] and partition the vector b 
conformally. Then the original system is premultiplied (preconditioning) by 

iLp = [A+ ...,A+ ...,A+] (2) 

where A+ = Af (AiAf)~^ is the Moore-Penrose pseudo inverse of A^ . 

We obtain the equivalent system HpAs = Hpb, 

p p 

(Pi -!-■■■ + Pp)s = ^ [ A )*" AiS = 'y [ A'l bi = Hpb, (3) 

i=l i=l 
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where for each i = Pi = Af Ai is the orthogonal projection onto 

range(A^). As A is non singular, the matrix HpA is symmetric and 

positive definite. Then the solution of (3) is approximated by the Conjugate Gra- 
dient(CG) method. The q (imderdetermined) linear least squares subproblems 
in the pseudoresidual unknowns Si^k 

AiSi^k = {bi - AiSk) , l<i<p (4) 

must be solved at each conjugate gradient iteration (fc = 1,2...). 

Combining the classic Newton method and the block Cimmino method we 
obtain the block Inexact Newton-Cimmino algorithm [11], in which at a major 
outer iteration the linear system J{xk)s = —G{xk), where J{x*) is the Jacobian 
matrix, is solved in parallel by the block Cimmino method. 

In [12,4] a simple p-block partitioning of the Jacobian matrix A was used 
for solving in parallel a set of nonlinear test problems with sizes ranging from 
1024 to 131072 on a CRAY T3E under the MPI environment. The least squares 
subproblems (4) were solved concurrently with the iterative Lanczos algorithm 
LSQR. 

In this paper (see in section 4) we adopt a suitable block row partitioning of 
the matrix A in such a way that AiAf = /, i = 1, . . .p, and consequently, Af = 
Aj . This simplify the solution of the subproblems (4). 

Due to the costly communication routines needed in this approach we have 
also implemented in parallel the preconditioned BiCGstab for the solution of the 
linearized system. As the preconditioner we choose AINV [2] which is based on 
the approximate sparse computation of the inverse of the coefficient matrix. 

2 Inexact Newton Method for Nonlinear Complementary 
Problems 

The methods of section 1 may be used to solve more general problems as the 
nonlinear mixed complementary problems [7] (including linear and nonlinear 
programming problems, variational inequalities, control problems, etc.). 

Let us consider the following system of constrained equations: 

F(i;,s,z) = = 0 (s,z)>0 (5) 

where G : ^ jg ^ nonlinear function oi v, S = diag(si, . . . , Sm), 

Z = diag(zi, . . . , Zm), e = (1, . . . 1)^. The interior point methods [10] for the 
solution of (5) require the solution of the nonlinear system F{x) = 0. Using the 
Newton method we have to solve at every iterations a linear system of the form 

F'{xk)Ax = -F{xk) + cTfcPfceo (6) 

where pk = i.sf.Zk)/ra, <Jk G]0, 1[, that is an Inexact Newton method [6]. 
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An interior point method in which the linearized system is solved approx- 
imately (by means of an iterative method) will be called inexact (truncated) 
interior point method. In this framework system (6) becomes 

F'{xk)Ax = -F{xk) + (Tktikeo + ru ( 7 ) 

where is the residual of the iterative method applied to the linear system 
satisfying ||rfc|| < rjkpLk, and pk is, for every k, the forcing term of the inexact 
Newton method [6]. Global convergence is assured by means of backtracking [1]. 

3 Numerical Results I (Sequential) 

We have applied the inexact interior point methods for the solution of two non- 
linear complementary problems: the (sparse) obstacle Bratu problem [5,8], and 
the (dense) Lubrication problem [9,5]. 

3.1 The Obstacle Bratu Problem 

This problem can be formulated as a nonlinear system of equations: 

f{v) = Zi-Z2, ZiSie = 0, Z 2 S 26 = 0 , Si = V-Vl, S2=Vu~V, (8) 

with the constraint Si, Zi > 0, i = 1, 2. The nonlinear function f{v) is defined as 

f{v) = Av- Xh‘^E{v)e, E{v) = diag(exp (wi), . . . ,exp(r;„)), 

where A is the matrix arising from FD discretization of the Laplacian on the 
unitary square with homogeneous Dirichlet boundary conditions, vi.,Vu are the 
obstacles and h is the grid spacing. 

The system ( 7 ) at step k can be written as 



■ B 


1 

0 

0 


I ' 




Av 




-f+Zi- Z2 




'bi' 


0 


0 Si 


0 




Zisi 




— ZiSie + aklXkC 




b2 


0 


0 Z 2 0 


^2 




As2 


= 


— Z2S2t + (TkPkS 


— 


bs 


-/ 


10 0 


0 




Azi 




— Si + V — Vl 




bi 


I 


0 10 


0 




Az2 




-S2 + Vu-V 




_b5_ 



where B = f'{v). Taking into account the simple structure of some of the block 
matrices, we can use a Schur complement approach to reduce the original system 
(5n X 5n) to a system with n rows and n columns. In this way we obtain a system 
in the Av unknown only: CAv = r, where 

C = B + Sf^Zi + Sf^Z2, r=bi + Sf\b2 - Zibi) - Sf^ib^ - Zz&s)- 

Once this nonsymmetric system has been solved (we used the BiCGstab solver), 
we can compute Azi, Az 2 and Asi, As 2 by: 

Azi = Sf^Zi{b2 - 64 - Av), 



Asi = 64-1- Av 
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Table 1. Results obtained with the inexact interior point Newton method for 
the obstacle Bratu problem with three different mesh sizes and four values of A 
(nl= nonlinear, it= iterations, s=seconds on a 600 Mhz Alpha workstation) 



A 


n 


nl it. 


tot lin it. 


CPU (s) ||R-|| 


1 


1024 


13 


397 


0.24 0.12371E-09 


4 


1024 


11 


335 


0.23 0.21711E-08 


6 


1024 


12 


374 


0.25 0.13904E-12 


1 


4096 


14 


873 


2.61 0.12668E-08 


4 


4096 


13 


828 


2.56 0.13289E-10 


6 


4096 


12 


840 


2.57 0.44356E-11 


1 


16384 


15 


1779 


36.21 0.34762E-08 


4 


16384 


14 


1700 


34.46 0.30286E-09 


6 


16384 


14 


2021 


40.68 0.79480E-09 



Az 2 = S 2 ^Z2{b3 — &5 + Av), As 2 = &5 — Av. 

We may note that matrix C is is obtained by adding to B the two nonnegative 
diagonal matrices and D~^Z 2 thus enhancing its diagonal dominance. 

The algorithm has been tested for different grids h = 1/32, 1/64, 1/128 with val- 
ues of n = 1024,4096,16384, respectively, for different values of A = 1,4,6,10. 
The initial vectors for the experiments are = [1, . . . , 1]^ with 

the obstacles vi = [0, . . . , 0]^, = [4, . . . , 4]^. For the last A-value we reported 

a failure since a number of backtracking larger than the allowed maximum (=5) 
have been recorded. Actually, for A > 6.8 the algorithm did not achieve conver- 
gence (this result is well documented in the literature, see [8]). The sequential 
results for the cases A = 1,4, 6 are reported in Table 1. The CPU times refer to 
the computation on a 600 Mhz Alpha workstation with 512 Mb RAM. 



3.2 The Lubrication Problem 

A very difficult problem from the point of view of nonlinearity is represented by 
the Elastohydrodynamic Lubrication Problem [5] which consists of two integral 
equations coupled with a differential equation - the Reynold’s equation. Given 
the parameters a, A and an inlet point Xa, find the pressure p{x), the thickness 
h(x), the free boundary Xf, and the variable k satisfying: 

2 

h{x) = x^ + k / In |x — s| ds in [xa, 00 ) 

J X 

d fhz'{x)dp\ ^dh 

dx \ e°-P dx ) dx 



2 

in [xa,Xf,], - / p{s) ds = l 

^ Jxa 



dj) 

with the free boundary conditions p{xa) = 0 and p{xb) = — (xb) = 0. The 
discretization of this problems yields a highly nonlinear and dense system of 
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equations. We solve the linearized system with a direct method (Lapack rou- 
tines). In Table 2 we show the results obtained with the inexact interior point 



Table 2. Results obtained for the Lubrication Problem with the inexact interior 
point Newton method with a = 2.832, A = 6.057 



n 


nl it. CPU (s) 


Jacobian LU factor LU solver 


|i7/|| 


200 

1000 


14 1.30 

19 69.41 


0.61 

20.75 


0.28 

35.59 


0.01 

0.68 


0.18840E-06 

0.13107E-06 



Newton method with a = 2.832, A = 6.057 using n = 200 and n = 1000 points 
of the discretization of the interval [—3, 2]. The initial vector is chosen as 





hi\ 



h = 



5 

n 



4 Numerical Results II (Parallel) 

Parallel results for nonlinear problems. Here we show the results ob- 
tained in the solution of the nonlinear system (1) applying the Newton-Cimmino 
method. As we mentioned above at the end of section 1, to overcome the problem 
of the costly solution of the least square subproblems (4), we adopt a suitable 
block row partitioning of the matrix A in such a way that AiAj = J, i = 1, . . . q, 
and consequently, Af = Af . This partitioning [11] is always possible for every 
sparse matrix and produces a number q of blocks Ai whose rows are mutually 
orthogonal. The numerical results of Table 3 were obtained on a CRAY T3E un- 
der the MPI environment for two sparse problems (also solved in [12,4] adopting 
a simple block partitioning, using the iterative algorithm LSQR), which arise 
from Finite Difference discretization in the unit square fl of the following PDFs: 

1. Poisson problem —Au — = tt = 0 in C, -|- b.c. (9) 

1 + x^ + y^ ^ ' 

2. Bratu problem [8] —Au — Ae“ = 0 in C, A G M, -I- b.c. (10) 

The linear system is solved using a tolerance £2 = 10“® while the Newton it- 
eration stops whenever the relative residual norm is less than £2 = 10“^. From 
Table 3 we can see that the speedups are not completely satisfactory, reaching 
the maximum value of 1.4 for p = 4 processors. This fact is mainly due to the 
cost of the communication routine MPI_ALLREDUCE which performs the commu- 
nication of the local pseudoresiduals and their sums on every processor. This 
operation is costly, and its cost increases with the number of processors. 
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A most effective parallel solution was obtained with the use of a standard 
Krylov method with AINV as a preconditioner [2]. AINV is based on the in- 
complete construction of a set of biconjugate vectors. This process produces two 
triangular factors Z and W and a diagonal matrix D so that: A~^ « . 

Therefore, application of the preconditioner consists in two matrix-vector prod- 
ucts and a diagonal scaling. These matrix- vector products have been parallelized 
exploiting data locality as in [3], minimizing in this way the communication 
among processors. The incompleteness of the process is driven by a tolerance 
parameter s. Previous (sequential) experimental results show that a choice of 
e S [0.05,0.1] leads to a good convergence of the Krylov subspace methods, 
very similar to that obtained using the ILU preconditioner. In our test cases we 
choose e = 0.05. 

In Table 4 we show the results when BiCGstab is employed as the linear solver 
using both AINV and the diagonal scaling (Jacobi) as the preconditioners. The 
CPU time on p processors (Tp) is measured in seconds on a CRAY T3E. From 
the results we note that for the small problem (n = 4096), as expected, the 
speedups Sp are not very high. However, for the n = 65 536 problem they reach 
a value of 19 (AINV) and 21 (Jacobi) on 32 processors. Note that in all the 



From the table we note that the major part 
of the computation is represented by the 
construction and factorization of the Jaco- 
bian matrix. This suggests that a Quasi- ^ 

Newton approach may drastically reduce | 
the CPU time of a single iteration. In | io" 

Figure 1 the nonlinear convergence profile ® 
is provided, showing the superlinear rate 
of the convergence of the Inexact Newton 
method. Figures 2 and 3 display the plots 
of the film thickness and the pressure, re- 
spectively. They compare well with the re- 
sults of the literature [9j. 

Fig. 1. Convergence profile 






Fig. 2. Film thickness 



Fig. 3. Pressure 



-3 
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Table 3. Time (in seconds), speedups Sp = Ti/Tp, number of outer and inner 
iterations kNEWT, kca, obtained for solving the two test problems, using the 
row-orthogonal partitioning on a CRAY T3E under the MPI environment 



Poisson problem 


n = 4096 


p = 1 


p = 2 


p = 4 


p = 8 


p = 16 


Time (speedup) 


7.22 


5.74 (1.3) 


5.79 (1.3) 


6.05 (1.2) 


6.83 (1.1) 


kNEWT 


2 


2 


2 


2 


2 


kcG 


1123 


1123 


1123 


1123 


1123 



Bratu problem 


n = 4096 


p = 1 


p = 2 


p = 4 


p = 8 


p = 16 


Time (speedup) 


6.70 


5.11 (1.3) 


4.96 (1.4) 


5.33 (1.3) 


8.01 (0.8) 


kNEWT 


4 


4 


4 


4 


4 


kcG 


630 


630 


630 


630 


630 



runs the CPU time needed by AINV is less than the one required by Jacobi. 
Moreover, the AINV preconditioner shows a degree of parallelism comparable 
with that of the diagonal scaling. 

Parallel results for nonlinear complementary problems. As in section 4, 
we also solved in parallel the obstacle Bratu problem (8) via the inexact inte- 
rior point method, using the BiCGstab method as linear solver with AINV and 
Jacobi as preconditioners. In Table 5 we show the results obtained. The same 
considerations of section 4 hold, even with larger speedup values. 

5 Conclusions and Future Topics 

In this paper we experimented that the Inexact Newton method performs well in 
solving nonlinear problems and mixed nonlinear complementary problems both 
in sequential and in parallel computations. We adopted two different parallel 
preconditioners in the iterative solution of sparse problems: the row-action Cim- 
mino method [11] and the incomplete inverse AINV [2]. While the latter obtains 
good results (speedup values up to 23 with 32 processor), the former heavily suf- 
fers for the overhead due to the MPI communication routines. Future work will 
address the parallel implementation of an Inexact Quasi-Newton interior point 
method applied to the solution of mixed nonlinear complementary problems. 
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Table 4. Results obtained on the CRAY T3E for the Bratu problem employing 
the AINV and Jacobi preconditioners 







o 

> 

< 


05) 




Jacobi 




n 


P 


Tp 


nl 


lin 


Sp 


Tp 


nl 


lin 


Sp 




1 


0.74 


3 


57 


- 


0.92 


4 


134 


- 




2 


0.47 


3 


57 


1.58 


0.57 


4 


133 


1.61 


4096 


4 


0.33 


3 


57 


2.24 


0.42 


4 


135 


2.19 




8 


0.25 


3 


57 


2.96 


0.33 


4 


134 


2.78 




1 


5.06 


3 


107 


- 


5.46 


4 


243 


- 




2 


2.82 


3 


107 


1.79 


3.32 


4 


268 


1.64 


16 384 


4 


1.53 


3 


107 


3.31 


1.82 


4 


270 


3.00 




8 


0.90 


3 


107 


5.62 


1.17 


4 


266 


4.67 




16 


0.61 


3 


107 


8.29 


0.74 


4 


257 


7.38 




1 


40.75 


4 


224 


- 


41.56 


4 


497 


- 




2 


21.02 


4 


223 


1.93 


20.42 


4 


479 


2.03 


65 536 


4 


11.00 


4 


225 


3.70 


13.31 


5 


582 


3.12 




8 


5.79 


4 


224 


7.03 


5.76 


4 


462 


7.21 




16 


3.48 


4 


224 


11.70 


3.24 


4 


466 


12.82 




32 


2.15 


4 


219 


18.95 


2.05 


4 


493 


20.27 



Table 5. Results obtained on a CRAY T3E for the obstacle Bratu problem 
employing the AINV and Jacobi preconditioners 









AINV(0.05) 




Jacobi 




n 


P 


Tp 


nl 


lin 


Sp 


Tp 


nl 


lin 


Sp 




1 


5.30 


14 


402 


- 


5.82 


14 


946 


- 




2 


2.96 


14 


405 


1.73 


3.23 


14 


938 


1.76 


4096 


4 


1.69 


14 


400 


2.75 


1.99 


14 


941 


2.60 




8 


1.05 


14 


400 


3.89 


1.23 


14 


948 


3.69 




16 


0.73 


14 


403 


4.83 


0.92 


14 


943 


4.72 




1 


41.66 


15 


823 


- 


46.05 


15 


1959 


- 




2 


21.56 


15 


820 


1.93 


24.11 


15 


1943 


1.91 


16 384 


4 


11.10 


15 


795 


3.75 


13.12 


15 


1967 


3.51 




8 


6.21 


15 


819 


6.71 


7.13 


15 


1968 


6.46 




16 


3.75 


15 


828 


11.11 


4.43 


15 


1970 


10.40 




32 


2.56 


15 


821 


16.27 


2.88 


15 


1950 


15.98 




1 


321.32 


16 


1688 


- 


346.82 


16 


4026 


- 




2 


162.25 


16 


1651 


1.93 


180.30 


16 


4022 


1.92 


65 536 


4 


84.64 


16 


1676 


3.79 


94.47 


16 


4022 


3.67 




8 


50.22 


16 


1672 


6.39 


51.39 


16 


4046 


6.74 




16 


24.31 


16 


1706 


13.21 


28.28 


16 


4043 


12.26 




32 


14.04 


16 


1675 


22.88 


15.05 


16 


4027 


23.04 












92 



L. Bergamaschi and G. Zilli 



References 

1. S. Bellavia An Inexact Interior Point method Journal of Optimization Theory and 
Applications, vol. 96, 1 (1998). 86 

2. M. Benzi and M. Tuma, A sparse approximate inverse preconditioner for nonsym- 

metric linear systems, SIAM J. Sci. Comput., 19 (1998), pp. 968-994. 85, 89, 

90 

3. L. Bergamaschi and M. Putti. Efficient parallelization of preconditioned conjugate 
gradient schemes for matrices arising from discretizations of diffusion equations. 
In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific 
Computing, March, 1999. (CD-ROM). 89 

4. L. Bergamaschi, I. Moret, and G. Zilli, Inexact block Quasi-Newton methods for 
sparse systems of nonlinear equations, J. Put. Cenerat. Comput. Sys. (2000) (in 
print). 85, 88 

5. I. Bongartz, I., A. R. Conn, N. I. M. Gould and P. L. Toint, CUTFi'. Constrained and 
unconstrained testing environment. Research Report, IBM T. J. Watson Research 
Center, Yorktown Heights, NY, 1993. 86, 87 

6. R. S. Dembo, S C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM. 
J. Numer. Anal. 19, 400-408, 1982. 84, 85, 86 

7. S. P. Dirske, and M. C. Ferris MCLIB: A collection of nonlinear mixed comple- 
mentary problems, Tech. Rep., CS Depth., University of Winsconsin, Madison, WS, 
1994. 84, 85 

8. D. R. Fokkema, G. L. G. Slejipen and H. A. Van der Vorst, Accelerated Inexact 
Newton schemes for large systems of nolinear equations, SIAM J. Sci. Comput., 
19 (2), 657-674, 1997. 86, 87, 88 

9. M. M. Kostreva, Elasto-hydroninamic lubrication: A non-linear complementary 
problem. International Journal for Numerical Methods in Fluids, 4:377-397, 1984. 
86, 89 

10. S. J. Wright, Primal-Dual Interior- Point Methods, Siam, Philadelphia, 1997. 84, 
85 

11. G. Zilli, Parallel method for sparse non-symmetric linear and non-linear systems 

of equations on a transputer network. Supercomputer, 66-XII-4, 4-15, 1996. 84, 

85, 88, 90 

12. G. Zilli and L. Bergamaschi, Parallel Newton methods for sparse systems of nonlin- 
ear equations, Rendiconti del Circolo Matematico di Palermo 11-58, 247-257, 1999. 
85, 88 



Skew-Circulant Preconditioners for Systems of 
LMF-Based ODE Codes 



Daniele Bertaccini^* and Michael K. Ng^** 

^ Dipartimento di Matematica, University of Firenze 
viale Morgagni, 67/a, 50134 Firenze, Italy 
bertaccini@na-net . ornl . gov 

^ Department of Mathematics, The University of Hong Kong 
Pokfulam Road, Hong Kong 
mngOmaths . hku . hk 



Abstract. We consider the solution of ordinary differential equations 
(ODEs) using implicit linear multistep formulae (LMF). More precisely, 
here we consider Boundary Value Methods. These methods require the 
solution of one or more unsymmetric, large and sparse linear systems. 

In [6], Chan et al. proposed using Strang block-circulant precondition- 
ers for solving these linear systems. However, as observed in [1], Strang 
preconditioners can be often ill-conditioned or singular even when the 
given system is well-conditioned. In this paper, we propose a nonsingu- 
lar skew-circulant preconditioner for systems of LMF-based ODE codes. 
Numerical results are given to illustrate the effectiveness of our method. 

1 Introduction 

In this paper, we consider the solution of ordinary differential equations (ODEs) 
by using implicit Linear Multistep Formulae (LMF). By applying the above 
formulae, the solution to a given ODE is given by the solution of a linear system 



where y(t), g(t) : H ^ IR™, z G IR™, and Jm G IR™^™ integrated using Bound- 
ary Value Methods (BVMs), a class of numerical methods based on the linear 

* Research supported in part by Italian Ministry of Scientific Research. 

** Research supported in part by Hong Kong Research Grants Council Grant No. HKU 
7147/99P and UK/HK Joint Research Scheme Grant No. 20009819. 



My = b, 



( 1 ) 



where M depends on the LMF used. 

Here, we concentrate on the linear initial value problem 
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multistep formulae (LMF) (see [4] and references therein). A BVM approximates 
its solution by means of a discrete boundary value problem. By using a /r-step 
LMF over a uniform mesh tj = to + jh, for 0 < j < s, with h = (T — to )/ we 
have 



/l — U /l — U 

n+i — ^ ^ ^ 






,3 — ^ + 1 '. 



( 3 ) 



Here, y„ is the discrete approximation to y(tn), fn = JmYn + ?,n and g„ = g(tn)- 

The BVM in (3) must be used with v initial conditions and fj, — v final 
conditions. That is, we need the values yo, • • • ,Yi^-i at t = tg and the values 
yn+fj.-u-i, • • • ,yn at t = T. The initial condition in (2) only provides us with 
one value. In order to obtain the other initial and final values, we have to provide 
additional (/r — 1) equations. The coefficients and of these equations 
should be chosen such that the truncation errors for these initial and final con- 
ditions are of the same order as that in (3), see [4, p,132]. By combining (3) with 
the additional methods, we obtain a linear system as in (1). 

The discrete problem (1) generated by the above process is given by 



My ={A®Im-hB® Jm)y = ei 0 z -I- h{B 0 Jm)g, 



( 4 ) 



where ei = (1,0, •••,0)* € y = (yo,-",ys)‘ € g = 

(gO) • • • jgs)* G and A and B are (s -I- l)-by-(s -1- 1) matrices given 

by: 





• 0 


“0 








ao 





A = 



ao ■ ■ ■ 



ao 



a, 



(s— /x+y+l) 



• • a 



(s— /x+y+l) 



V 









/ 



and B can be defined similarly. The size of the matrix M is very large when h 
is small and/or m is large. If a direct method is used to solve the system (4), 
e.g., in the case of a d-level structure arising in d-dimensional partial differential 
equations, the operation count can be much higher for practical applications (see 
the numerical comparisons with a band solver in [2]). 

In [1,2], Bertaccini proposed to use Krylov subspace methods such as the 
Saad and Schultz’s GMRES method to solve (1). In order to speed up the con- 
vergence rate of Krylov subspace methods, he proposed circulant matrices as 
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preconditioners. The first preconditioner proposed in [1,2] for the matrix M in 
(4) is the well-known T. Chan circulant preconditioner, see [5]. The second one 
proposed in [1,2] is a new preconditioner that he called the P-circulant precon- 
ditioner. Moreover, Bertaccini [2] and Chan et al. [6] proposed the generalized 
Strang preconditioner for (4). They showed theoretically and numerically that 
both the P-circulant and generalized Strang preconditioned systems converge 
very quickly. However, when Jm is singular (for instance in some ODEs, see [1]), 
the matrix S is singular. The main aim of this paper is to propose a nonsingular 
block skew-circulant preconditioner for M. 

We stress that, in the current literature, there exist some algorithms for 
banded Toeplitz linear systems whose theoretical computational cost is lower 
(see e.g. [7] and references in [5]). They are very effective in the symmetric 
positive definite case. However, the linear system (4) is usually unsymmetric 
and can have a high condition number, even if slowly growing with s (at most 
linearly, see [3,4]). Thus, if the normal equations approach is used to solve (4), 
care should be used in order to avoid possible severe numerical instability and/or 
very slow convergence if an iterative solver is used, even for simple problems, 
as observed in [3]. Moreover, notice that the diagonalization of the Jacobian 
matrix in (4), is usually very expensive (if possible) when m is large and can 
be an ill-conditioned problem. Thus, the use of a solver that involves explicitly 
the above decomposition can be not appropriate (see also implementation details 



The paper is organized as follows. In §2, we introduce the new block skew- 
circulant preconditioner in and give the convergence analysis of our method. 
Finally, numerical examples are given in §3. 

2 Construction of Skew-Circulant Preconditioners 

In [I], Bertaccini proposed to use Krylov subspace methods with block-circulant 
preconditioners for solving (4). Two preconditioners were considered. The first 
one is the T. Chan block-circulant preconditioner T. It is defined as 



where c{A) is the minimizer of ||A — CUf over all (s -I- l)-by-(s -|- 1) circulant 
matrices C under the Frobenius norm || ■ ||f, see [5], and c{B) is defined similarly. 
More precisely, the diagonals Uj and Pj of c{A) and c{B) are given by 



and Pj similarly but with Pj+v instead of Uj+v, respectively. The second precon- 
ditioners proposed in [1,2] is called the P-circulant preconditioner. It is defined 
as 



in [2]). 



T = c{A) ® Im- hc{B) 0 Jm 



( 5 ) 




P = A® Im- hB ® Jm 



( 6 ) 
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where the diagonals a.j and Pj of A and B are given by 



Ctj — 1 1 



V s + 1/ 



(Xj+I, 



s + 1 



-a 



j+l/— (s+l) ; J 1^5 



and Pj similarly but with Pj+u instead of aj+^, respectively. Bertaccini [1,2] and 
Chan et al. [6] considered using the following generalized Strang preconditioner 
for (4): 



S = s(A) ® Im- hs{B) (g) Jjn, 



( 7 ) 



where s(4l) is given by 



( OLi/ ■ ■ * OiQ * * * OLi/—\ \ 



3{A) = 



«o 



ao 



■ ‘ ‘ O^/i 0^0 ‘ ‘ ■ CTi/ } 



and s{B) can be defined similarly. Due to consistency condition on coefficients 
of LMF: ~ ■®(^) always singular. If, for simplicity, Jm is diago- 

nalizable, the eigenvalues of (7) are pj — hpjfir, pj, pj, fJ-r eigenvalues of s(A), 
s{B), Jm, respectively. Then, we have the following result: 



Lemma 1. If some eigenvalues of Jm zero, then the preconditioner S is 
singular. 



In this paper, we propose the following preconditioner for (4): 



C = s(A) ® Im- hs{B) (g Jm, 



( 8 ) 
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where s(7l) is given by 



( (Xu * ■ ■ (X^ 



— Qfo • • • —au-1 \ 



ao 



S(A) = 



0 

V 



— ao 

0 

«0 • ■ • au j 



and s{B) can be defined similarly. We note that s(7l) and s{B) are the Strang- 
type skew-circulant preconditioners of A and B respectively, see [5]. 

Now we are going to prove that C is invertible provided that the given BVM 
is Oj/^^_,y-stable. The stability of a BVM is closely related to two characteristic 
polynomials defined as follows: 



p{z) = ^ OLj+uZ^ and a{z) = z'' Pj+uZ^. (9) 

j=-i' 

Note that they are p-degree polynomials. A polynomial p{z) of degree /i is 
an Aii,^^_,^-polynomial if 

I-Zll < I-Z2I < • • • < \Zu\ < 1 < \Zu+l\ < ■ ■ ■ < \zP\, 

being simple roots of unit modulus. 

Definition 1. [4-, P-97] Consider a BVM with the characteristic polynomials 
p{z) given by (9). The BVM is said to he Qu,^i-v~stable if p{z) is an Nu,^-u~ 
polynomial. 

Definition 2. [4, p.lOl] Consider a BVM with the characteristic polynomials 
p{z) and u{z) given by (9). The region 

T>u,fi-u = {(? C C : p{z) — q<j{z) has v zeros inside \z\ = 1 
and p — V zeros outside \z\ = 1} 

is called the region of A,^_^_i,-stability of the given BVM. Moreover, the BVM 
is said to be Ai,^^_;^-stable if 

C" = {g e C : Re{q) < 0} C Vu,f,-u- 

Theorem 1. If some eigenvalues of Jm are zero while the others are in C~ and 
the BVM for (2) is Ou,^-u~stable, then the preconditioner C is nonsingular. 
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However, if all the eigenvalues of Jm are not equal to zero, then we can apply 
an BVM method and we have the following result similar to the 

Strang circulant preconditioner S (see [6,2]). 

Theorem 2. If the BVM for (2) is -stable and hXk{Jm) € then 

the preconditioner C is nonsingular. 

Next we show that the spectrum of the preconditioned system is clustered 
around 1 and hence Krylov subspace methods will converge fast if applied to 
solving the preconditioned system, see [6]. 

Theorem 3. All the eigenvalues of the preconditioned matrix C~^M are 1 ex- 
cept for at most 2mp outliers. When Krylov subspace methods are applied to 
solving the preconditioned system C~^My = b, the method will converge in at 
most 2mp + 1 iterations in exact arithmetic. 

Regarding the cost per iteration, the main work in each iteration for Krylov 
subspace methods is the matrix-vector multiplication 

C~^M’Z = (s(H) ® Im — hs{B) (g) Jm)~^{A ® Im — hB 0 Jm)z 

Since A, B are banded matrices and Jm is assumed to be sparse, the matrix- 
vector multiplication {A ® Im — hB 0 Jm)z can be done very fast. To com- 
pute C“^(Mz), since 5(H) and s{B) are circulant matrices, we have the following 
decompositions 5(H) = DF AaF* D* and s{B) = DF AbF* D* , where 

D = diag(l, e-W(«+i)^ . . . ^ ^-sin/{s+i)^^ 

Aa and Ab are diagonal matrices containing the eigenvalues of 5(H) and 5(H) 
respectively and F is the Fourier matrix, see [5]. It follows that 

C~\M^) = {D*F* 0 Im){AA ®Im- hAB 0 Jm)-\DF 0 Im){Mz). 

This product can be obtained by using Fast Fourier Transforms and solving s 
linear systems of order m. Since Jm is sparse, the matrix Aa ® Im — hAB ® Jm 
will also be sparse. Thus C“^(Mz) can be obtained by solving s sparse m-by-m 
linear systems. 



3 Numerical Tests 

To compare the effectiveness of our preconditioner with various circulant approx- 
imations we have considered two test problems. We have omitted comparisons 
with the preconditioner based on the T. Chan circulant approximation because 
the T. Chan preconditioner can be very ill-conditioned, see [2]. 

We will compare the number of iterations needed to converge for the GMRES 
method. More numerical tests can be found in [1,2, 3, 6]. Some implementation 
details can be found in [1]. The initial guess for those iterative solvers is the 



Skew-Circulant Preconditioners for Systems of LMF-Based ODE Codes 



99 



zero vector. The stopping criterion is \\rjW 2 < 10 ®||&|| 2 , fj true residual after j 
iterations. All experiments are performed in MATLAB. 

Moreover, we list the condition numbers of the matrix of the underlying linear 
system and of the different block preconditioners by LINPACK estimated 1-norm 
procedure. We will see that the condition numbers of the original system, P- 
circulant and skew-circulant preconditioners are often about the same, differently 
to what happens for the Strang preconditioner. 

Example 1: We consider the advection equation of first order with periodic 
boundary conditions 



{ du du 
dt dx 

u{x, 0) = x(tt — x), a; S [0, tt] 
u{tt , t) = u{0 , t) , tG[0,27r] 



We discretize the partial derivative d/dx with the central differences and step 
size 5x = Tr/m. We obtain a family of systems of ODEs with a m x m skew- 
symmetric Jacobian matrix. 

The generalized Adam Method with fc = 3 (order 4, see [4] for the coeffi- 
cients), suitable for ODE problems whose Jacobian matrix has eigenvalues on 
the imaginary axis, is used to solve the above differential equation. The number 
of matrix- vector products required to solve the related linear system are given in 
Table 1. It can be observed that the skew-circulant-based block preconditioned 
iterations converge usually fast, while the Strang-based one cannot be used for 
odd m because the Jacobian matrix has an eigenvalue equal to zero. 



Table 1. (Example 1) Number of matrix- vector multiplications required for 
convergence of GMRES, where * denotes that the preconditioner cannot be 
used and its condition number is undefined 







No Precond. 


Strang 


P-circulant 


Skew-circulant 




m 


s 


Iter. 


Cond. 


Iter. 


Cond. 


Iter. 


Cond. 


Iter. 


Cond. 




25 


8 


157 


170 


* 


* 


23 


130 


30 


150 






16 


136 


280 


* 


* 


22 


200 


28 


1700 






32 


98 


480 


* 


* 


21 


340 


21 


840 




50 


8 


299 


330 


23 


5700 


20 


230 


36 


570 






16 


328 


530 


28 


4000 


23 


340 


30 


790 






32 


234 


770 


34 


70000 


28 


580 


24 


2500 




75 


8 


>500 


450 


* 


* 


20 


330 


38 


9100 






16 


>500 


660 


* 


* 


25 


500 


31 


590 






32 


430 


1200 


* 


* 


26 


780 


43 


3400 
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Example 2: Let us consider the heat equation with a variable diffusion coeffi- 
cient 






du d 
dt dx 
u(0, t) = u{x. 
u{x, 0) = X, 



a(x) 



du 



= 0 , 



dx 

= 0 , 



t e [0, 27 t] 
X € [0, 7t] 



If we discretize the operator d/dx with centered differences and stepsize Sx = 
-K j{m. + 1). We obtain a system of m ODEs whose m x m Jacobian matrix is 
tridiagonal (Toeplitz if and only if a{x) is constant). We note that the Jacobian 
matrix has real and strictly negative eigenvalues. Here, a{x) = exp(— x^). 



Table 2. (Example 2) Number of matrix-vector multiplications required for 
convergence of GMRES 







No Precond. 


Strang 


P-circulant 


Skew-circulant 


m 


S 


Iter. 


Cond. 


Iter. 


Cond. 


Iter. 


Cond. 


Iter. 


Cond. 


20 


8 


75 


1.9 


X 


hF 


14 


CO 

o 


X 


ffp 


13 


1.3 


X 


T(F 


9 


7.7 


X 


1(F 




16 


114 


1.9 


X 


10® 


14 


CO 

o 


X 


10“ 


13 


1.3 


X 


10® 


9 


7.7 


X 


10^ 




32 


159 


1.9 


X 


10® 


14 


CO 

o 


X 


10“ 


14 


1.3 


X 


10® 


9 


7.7 


X 


10^ 


50 


8 


193 


1.2 


X 


T(F 


44 


1.0 


X 


IcF 


14 


00 

O 


X 


T(F 


10 


4.8 


X 


IcF 




16 


308 


1.2 


X 


10‘‘ 


34 


1.0 


X 


10“ 


15 


8.1 


X 


10® 


10 


4.8 


X 


10® 




32 


453 


1.2 


X 


10"^ 


55 


1.0 


X 


10“ 


15 


8.1 


X 


10® 


10 


4.8 


X 


10® 


100 


8 


>500 


4.5 


X 


T(F 


40 


6.0 


X 


10“ 


14 


3.0 


X 


hF 


10 


2.0 


X 


TcF 




16 


>500 


4.5 


X 


10‘‘ 


49 


6.0 


X 


10“ 


15 


O 

CO 


X 


10'* 


10 


2.0 


X 


10* 




32 


>500 


4.5 


X 


10‘‘ 


70 


6.0 


X 


10“ 


15 


O 

CO 


X 


10'* 


10 


2.0 


X 


10* 



The generalized Adams Method with fc = 4 (order 5, see [4] for the coeffi- 
cients), suitable for stiff problems, is used to solve the differential problem. The 
number of matrix- vector products needed to solve the related linear system, when 
/3 = 3, are given in table 2. It can be observed that the Strang preconditioner 
can be used if j3 is between 0 and 1 and m is not too large. For instance, when 
/3 = 3, the number of iterations of using the Strang preconditioner increases 
significantly when m increases. 

Moreover, we find that the ill-conditioning of the Strang circulant approx- 
imation gives polluted numerical results already when m is of the order of a 
hundred. For /3 > 3, the Strang block preconditioner cannot be used at all be- 
cause it is severely ill-conditioned even if the double precision is in use. However, 
the new skew-circulant preconditioner performs very well. 
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Abstract. We present new 6-th and 8-th order explicit symplectic Run- 
ge-Kutta-Nystrom methods for Hamiltonian systems which are more 
efficient than other previously known algorithms. The methods use the 
processing technique and non-trivial flows associated with different el- 
ements of the Lie algebra involved in the problem. Both the processor 
and the kernel are compositions of explicitly computable maps. 



1 Introduction 

In Hamiltonian dynamics, a frequent special case occurs when the Hamiltonian 
function reads 

H{q,p) = ^p^M-^p + V{q) , ( 1 ) 

with M a constant, symmetric, invertible matrix. In this situation the equations 
of motion are 

q = M"^p, p = -VqU(q) (2) 

or, after elimination of p, 

q = -M-lVqU(q) . (3) 

It is therefore natural to consider Runge-Kutta-Nystrom (RKN) methods 
when the second order system (3) has to be solved numerically. These methods 
can be rendered symplectic, thus preserving qualitative features of the phase 
space of the original Hamiltonian dynamical system. In fact, a number of sym- 
plectic RKN schemes of order < 4 have been designed during the last decade 
which outperform standard non-symplectic methods (see [9] for a review), and 
the recent literature has devoted much attention to the integration of (1) by 
means of efficient high-order symplectic algorithms [5,6,8,11]. The usual ap- 
proach is to compose a number of times the exact flows corresponding to the 
kinetic and potential energy in (1) with appropriately chosen weights to achieve 
the desired order. More specifically, if A and B denote the Lie operators 

A = M-lpVq, H=-(VqU)Vp (4) 

L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 102—109, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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associated with and t^(q), respectively [1], then the exact solution of 

(2) can be written as 

z{t) = e*^^+^)z(0) = e‘(^+-®>Zo , 

where z = (q, p)^, and the evolution operator for one time step h = t/N 

is approximated by 

^h{A+B) _ ^ha.A^hb.B ( 5 ) 

2=1 



with 



e'‘“^Zo = (qo + haM ^po, po)"^ (6) 

e'^^’^zo = (qo, Po - qV {cio))'^ . 

Observe that the approximate solution Za(t) = e‘^“Zo evolves in the Lie group 
whose Lie algebra L{A, B) is generated by A and B with the usual Lie bracket 
of vector fields [1]. 

The coefficients ai, bi in (5) are determined by imposing that 

H, = A + B + 0{h^) (7) 

to obtain an n-th order symplectic integration method. This makes necessary to 
solve a system of polynomial equations, which can be extraordinarily involved 
even for moderate values of n, so that various symmetries are usually imposed in 
(5) to reduce the number of determining equations. For instance, if the composi- 
tion is left-right symmetric then Ha does not contain odd powers of h, but then 
the number of flows to be composed increases. Although additional simplifica- 
tions also take place due to the vanishing of the Lie bracket [B, [B, [i3, A]]] for 
the Hamiltonian (1), the question of the existence of high-order RKN symplectic 
integrators more efficient than standard schemes is still open. 

Recently, the use of the processing technique has allowed to develop extremely 
efficient methods of orders 4 and 6 [2] . The idea is to consider the composition 

^hn(h) ^ ^P^hK^-p (8) 

in order to reduce the number of evaluations: after N steps we have ~ 

gtn{h) _ ^-p ^ gP processor) is applied, then (the 

kernel) acts once per step, and finally e~^ is evaluated only when output is 
needed. Both the kernel and the processor are taken as composition of flows 
corresponding to A and B, in a similar way to (5). 

In this paper, by combining the processing technique with the use of non- 
trivial flows associated with different elements of L{A, B) we obtain optimal 6-th 
order RKN methods more efficient than others previously known and some 8-th 
order symplectic schemes with less function evaluations per step. The analysis 
can also be easily extended to a more general class of second order differential 
equations. 
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2 Analysis and New Methods 

In addition to A and B there are other elements in L(A, B) whose flow is ex- 
plicitly and exactly computable. In particular, the flow corresponding to the 
operators 



^3,1 = [B, [A, B]] ^5,1 = [B, [B, [A, [A, B]]]] (9) 

^7,1 = [B, [A, [B, [B, [A, [A, B]]]]]] Vr,2 = [B, [B, [B, [A, [A, [A, B]]]]]] 

has an expression similar to the second equation of (6) by replacing VqI4 with 
an appropriate function g(q) [3]. Therefore it is possible to evaluate exactly 
exp{hCb,c,d,ej), with 

Cb,c,d,ej = t>B + h^cVs^i + h^dVb^i + h^{eV7^i + fVr^2) , (10) 

b, c, d, e, and / being free parameters. We can then substitute some of the 
factors by the more general ones both in the kernel and the pro- 

cessor in order to reduce the number of evaluations and thus improve the overall 
efficiency. The operator Cb,c,d,ej will be referred in the sequel as modified po- 
tential, and we simply write Cb,c when d = e = f = 0. 

By repeated application of the Baker-Campbell-Hausdorff formula [10] the 
kernel and processor generators K and P can be written as 

oo ( d{i) 'I oo f d{i) 

i=2 [ 7 = 1 J i=l [ i=l J 



where d(m) denote the dimension of the space spanned by brackets of order m 
of A and B (its first 8 values being 2,1,2,2,4,5,10,15) and is a basis 

of this space. Therefore 

cso I d{i) 

H{h) = e^Ke-^ = A + B + '^l ^ 

i =2 [ 7=1 

where the fij coefficients are given in terms of polynomials involving kij and pij 
[2]. Specific n-th order integration methods require that fij = 0 up to i = n, and 
these equations impose restrictions to the kernel: it must satisfy k{n) = d{n) — 1 
independent conditions (n > 2) [2], and k(2n) = k(2n — 1) if it is a symmetric 
composition. The explicit form of these conditions and the coefficients pij of the 
processor P in terms of up to order 8 have been obtained in [3]. It has also 
been shown that the kernel completely determines the optimal method we can 
obtain by processing [2]. Here optimal means that the main term of the local 
truncation error attains a minimum. 

As stated above, we take as processor of a RKN method the explicitly com- 
putable composition 

2 = 1 




( 13 ) 
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where the replacement exp(hyiB) i — > exp(hCy^^y.^,,,) can be done when neces- 
sary, and the number r of B (or C) evaluations is chosen to guarantee that the 
Sr=i equations pij = Pij{zk^yk) have real solutions. 

As far as the kernel is concerned, due to the different character of the oper- 
ators A and i3, two types of symmetric compositions have been analyzed: 

(i) Type ABA: (eS = ELi = l) 

^haiA^hbiB ^ha2A 

with as+ 2 -i = tti and bg+i-i = h. 

(ii) Type BAB: (eLi a, = ES h = l) 

g/tif _ ^hbiB ^haiA^hb^B 

with Os+i-i = Oi and hs+ 2 -i = h. 

A systematic analysis of the 6-th order case has been afforded in [3], where 
a number of optimal processed methods with modified potentials and s = 2, 3 
were obtained. There also some methods involving only A and B evaluations with 
s = 4, 5, 6 were also reported, with their corresponding truncation error. Here we 
have generalized the study to seven stages (s = 7). Now the three free parameters 
allow to find an extremely efficient 6-th order processed method: it has error 
coefficients which are approximately 50 times smaller than the corresponding to 
the most efficient 6-th order symplectic non-processed RKN method with s = 7 
given in [8]. In Table 1 we collect the coefficients of this new processed method 
and also of the most efficient 6-th order algorithm we have found involving the 
modified potential Cb^c,o,ej in the kernel and Cy^y in the processor. 

A similar study can be carried out, in principle, for the 8-th order case, 
although now the number of possibilities (and solutions) increases appreciably 
with respect to n = 6, so that the analysis becomes extaordinarily intricate. 
Here we have considered kernels with s = 4, 5 involving modified potentials 
and s = 9, 10, 11 when only A and B evaluations are incorporated. Taking into 
account the well known fact that methods with small coefficients have been 
shown to be very efficient [6], we apply this strategy for locating possible kernels. 
The coefficients of two of them are given in Table 1, although many others are 
available. 

On the other hand, the coefficients Zk, yk in the processor (13) have to satisfy 
26 equations, but this number can be reduced by taking different types of com- 
positions. For instance, if the coefficients in = J([j ^hziA^hyiB determined 
in such a way that Q{h) = \P{h) + O(h^), then -k 0(h®) 

because P{h) is an even function of h up to order /i®. Then, only 16 equations 
are involved. Here also the criterium we follow is to choose the smallest coeffi- 
cients Zfc, yk of 



hagA hhsB ha^ 

eee 



-lA 



(14) 



^hbsB ^hagA^hbg^xB 



(15) 
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Table 1. Coefficients of the new symplectic RKN integrators with processing 



Order 6 ; Type BAB; s — 7; r — 8 



fei = 0.115899400930169 &2 = -1.21532440212000 

ai = 0.244868573793901 aa = -0.00214552789272415 
21 = -0.350316247513416 za = 0.0744434640156453 
24 = -0.0597184197245884 25 = 0.404915108936223 
27 = -0.0346188279494959 zg = - X)I-i 
yi = 0.218575120792731 ya = -0.370670464937763 
Vi = -0.225359207496863 ys = 0.0878524557495559 
yy = -0.155222704734044 y 8 = -V’’ y; 

2-^ j, = l 



63 = 1.45706208067905 
03 = 0.301340867944477 
23 = -0.0369370026731913 
26 = -0.180941427380936 



V3 = 0.342037685653768 
ye = 0.195239165175742 



Order 6 ; Type ABA; s — 3; r — 6 ; Modified potential 



ai = -0.0682610383918630 bi = 0.2621129352517028 
ca = 0.0164011128160783 
ea = 1.86194612413481 • lO"'^ 
21 = 0.1604630501234888 yi = -0.012334538446142270 
za = -0.1222126706298830 ya = -0.6610294848488182 

23 = 0.1916801124727711 ye = -0.023112349678219939 

24 = 0.5630722377955035 yt = 1.81521815949959 • lO”"* 

25 = -0.7612758792358986 ys = 2.3768244683666757 

E 5 

.--1 ye = - Z^.-i V' 



Cl = di = ei = /i = 0 
da = 0 

/a = -6.3155794861591 • 10~® 
VI = 0.013816178183636998 
V2 = -0.050288359617427786 
1)3 = -0.013462400168471472 
Vi = 6.03819193361427 • 10^* 
ve — —0.01 
ve — 0.01 



Order 8 ; Type BAB; s = 11; r = 8 

bi = 0.03906544126305366 ba = 0.216015988434324 63 = -0.126717696299036 

bi = -0.04128542496526060 be = 0.04458478096712717 

ai = 0.142940453575212 aa = 0.309791505162032 03 = 0.301210185530089 

04 = -0.005822573683400349 05 = -0.344741324170165 

21 = -0.0295940574778285 za = 0.0102454583206065 23 = 0.168519324003820 

24 = -0.577391651425342 25 = 0.0991834279391326 23 = 0.0203810695211463 



27 = -0.106234446989598 
yi = 0.175492972679660 
yi = 0.0926169248899539 
yj = -0.0918456713646654 




ya = -0.372698829093994 ye = -0.00224032125918971 
ye = -0.201446308655374 ye = 0.216983390044259 

ye ^ - ELi y* 



Order 8 ; Type BAB; s — 5] r = 7; Modified potential 



di - 0.0001219127419188233 
ei = 5.741889879702246 • 10"® 
be = -0.1945897221635392 
oi = 0.6954511641703808 
21—0 

2 a = -0.004624860718237988 

23 = 0.3423219445639433 

24 = 0.1760176996772205 

25 = 0.3625045293826689 

26 = -0.2729727321466362 




/i = -2.271708973531348 • 10"® 
ca = 5.222572249380952 • 10"* 
oa — —0.05 

yi = 0.3644761259072299 
ya = -0.2849544383272169 
ye = 0.2023898776842639 
yi = -0.2743578195701579 
ye = -4.75975395524748 ■ 10"® 
ye = 0.1455974775779454 

yj = - X)Li y< 



VI = 0.016298916362212911 
V2 = -0.019769812343547362 
ve = 0.004608026684270971 

V4 — 0 
V5 — 0 
V6 — 0 
vj — 0 



3 A Numerical Example 

To test in practice the efficiency of these new symplectic methods, we compare 
them with other schemes of similar consistency on a specific example. For order 6, 
these are the most efficient seven stage method designed by Okunbor and Skeel, 
OS6 [8], and the non-symplectic variable step RKN method, DP6, obtained in [4]. 
Concerning the 8-th order, we compare with the symplectic integrator due to 
Yoshida [11] (Yos8, 15 function evaluations per step), the method obtained by 
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McLachlan [6] (McL8, 17 stages) and the optimized symmetric scheme designed 
by Calvo and Sanz-Serna [5] (CSSS, 24 evaluations). 

The example we consider is the perturbed Kepler Hamiltonian 

(16) 

with r = x'^ + y^- This Hamiltonian describes in first approximation the dy- 

namics of a satellite moving into the gravitational field produced by a slightly 
oblate planet. The motion takes place in a plane containing the symmetry axis 
of the planet [7]. 

We take e = 0.001, which approximately corresponds to a satellite moving 
under the influence of the Earth, and initial conditions x = 1 — e, y = 0, Px = 
0, py = a/(1 -I- e)/(l — e), with e = 0.5. We integrate the trajectory up to the 
final time t/ = IOOOtt and then compute the error in energy, which is represented 
(in a log-log scale) as a function of the number of B evaluations. 

Obviously, the computational cost of evaluating the modified potential must 
be estimated. This has been done by running the same program repeatedly with 
different types of modified potential and only with the evaluation of B. We ob- 
serve that, for this problem, an algorithm using Cb,c,d,ej is twice as expensive 
as the same algorithm involving B evaluations, and only a 20% more compu- 
tationally costly when Cb,c are involved. This is so due to the reuse of certain 
calculations in the modified potentials. 

With this estimate, we present in Fig. 1(a) the results obtained with the 6-th 
order processed methods of Table 1, in comparison with DP6 and OS6, whereas 
the relative performance of the 8-th order symplectic schemes is shown in Fig. 
1(b). Solid lines denoted by pmfc and pfc, k = 6,8, are obtained by the new 
methods with and without modified potentials, respectively. 

It is worth noticing the great performance of the symplectic processed sche- 
mes of Table 1 with respect to other standard symplectic and non-symplectic 
algorithms. This is particularly notorious in the case of the 6-th order integra- 
tors, due to the fact that a full optimization strategy has been carried out in 
the construction process. In the case of order 8, the new methods are also more 
efficient than other previously known symplectic schemes, although only a par- 
tial optimization has been applied. In this sense, there is still room for further 
improvement. 

Finally, we should mention that the results achieved by p8 are up to two 
orders of magnitude better than those provided by McL8 for other examples we 
have tested. These include the simple pendulum, the Gaussian and the Henon- 
Heiles potentials. 

4 Final Comments 

Although in the preceding treatment we have been concerned only with Hamil- 
tonian systems, it is clear that essentially similar considerations apply to second 
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ERROR IN ENERGY 





Fig. 1. Average errors in energy vs. number of evaluations for the sixth (a) and 
eighth (b) order processed symplectic RKN methods 



order systems of ODE of the form 

k = f(x), xeIR', f:IR' — (17) 

when it is required that some qualitative or geometric property of (17) be pre- 
served in the numerical discretization. In fact, introducing the new variables 
z = (x,v)^, with V = X, and the functions f^= (v, 0), fs= (0,f(x)) G we 
have 

Z = fA + fB, (18) 

with the systems z = Fa and z = explicitly integrable in closed form. In this 
case the Lie operators A, B are given by 

A = vVx , R = f(x)-Vv (19) 

and the methods of Table 1 can be directly applied for carrying out the numerical 
integration. This is so even for the physically relevant class of time-dependent 
non-linear oscillators of the form 



x-f ^x-t- fi(x) = f 2 (t) . 



( 20 ) 
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Convergence of Finite Difference Method 
for Parabolic Problem with Variable Operator* 



Dejan Bojovic 
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Abstract. In this paper we consider the first initial-boundary value 
problem for the heat equation with variable coeficients in the domain 
(0,1)^ X (0,T]. We assume that the solution of the problem and the 
coefficients of equation belong to the corresponding anisotropic Sobolev 
spaces. Convergence rate estimates consistent with the smoothness of the 
data are obtained. 



1 Introduction 



For the class of finite difference schemes approximating parabolic initial- 
boundary value problems convergence rate estimates consistent with the smooth- 
ness of data, i.e. 



\u — v\ 






t/1 



(Qivt) 



< c{h + ^y 



'w: 



I, s/2 



(Q) 



s > r. 



( 1 ) 



are of the major interest. Here u = u{x,t) denotes the solution of the original 
initial-boundary value problem, v denotes the solution of corresponding finite 
difference scheme, h and r are discretisation parameters, denotes 

anisotropic Sobolev space, Wy^^^{Qhr) denotes discrete anisitropic Sobolev 
space, and C is a positive generic constant, independent of h, r and u. For prob- 
lems with variable coefficients constant C depends on the norms of coefficients. 

Estimates of this type have been obtained for parabolic problems with cof- 
ficient wich depends only from variable x [1]. In this paper we are deriving 
estimates for the parabolic problem with coeficients depending from variables x 
and t. Bramble-Hilbert lemma [2] is used in ours proof. 



2 Initial— Boundary Value Problem and Its Aproximation 

Let us define anisotropic Sobolev spaces Q = f2 x I, I = (0,T), 

as follows [5]: 

= Lyywim n , 

* Supported by MST of Republic of Serbia, grant number 04M03/C 
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with the norm 












We consider the first initial-boundary value problem for parabolic equation 
with variable coefficients in the domain Q = Q x (0,T] = (0,1)^ x (0,T] 



du V — ^ d / , . du \ p / \ ^ 



dxi 



dxi 



( 2 ) 



= 0 , (x,t) € df2 X [0, T] , u{x, 0) = uo(x) , x € S7. 



We assume that the generalized solution of the problem (2) belongs to the 
anisotropic Sobolev space bF 2 ’*^^(g) , 2 < s < 4 (see [4]), with the right- 
hand side f{x,t) which belongs to ■ Consequently, coefficients 

Gi = ai(x,t) belong to the space of multipliers M [6], i.e. it 

is sufficient that [3] 



for 2<s<3, 

Gi G , for 3 < s < 4 . 

We also assume that the coefficients Gi{x, t) are decrasing functions in variable t, 
and Gi{x, t) > Co > 0 . 

Let CO be the uniform mesh on 17 = [0, 1]^ with the step size h , oo = 
w n 17 , 7 = w n 9l7 . Let dr be the uniform mesh in (0, T) with the step size 
T, 0+ = 0^ U {T}, 0,- = 0T U {0,T}. We define uniform mesh in Q: Qhr = 
CO X 0r, = W X 0+ i = CO xOr- 

It will be assumed that 



c\h^ <T < 02 0 “^ , Cl , C2 = const > 0 . 

We define finite differences in the usual manner [7]: 



Vx. = 



= V. 



+i 



Vt{x,t) = 



v{x, t + t) — v{x, t) 



= Vi{x,t + r). 



where = v{x ± hri,t) and is the unit vector along Xi axis. We also 

define the Steklov smoothing operators: 

Tt f{x,t) = [ f{x + hx'n,t)dx' = T~ f{x + hn,t) , 

Jo ^ 

Tff{x,t) = T+T~ f{x,t) = j (1 - \x'\)f{x + hx'r^,t)dx' , 
Tt^f{x,t)= [ f(x,t + Tt')dt' = Tt~ f(x,t + T) . 

Jo 
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The initial-boundary value problem (2) will be approximated on by the 
finite difference scheme 

Vi+LhV = T?TiTif, in Q+ , 

- 

w = 0 , on 'Y X Or , V = uq , on to x {0} , 

where 

1 2 

LhV = + {aiVxi)xi) ■ 

i=l 

The finite-difference scheme (3) is the the standard symmetric scheme with 
the averaged right-hand side. Note that for s < 4 the right-hand side may be 
discontinuous function, so scheme without averaging is not well defined. 



3 Convergence of the Finite— Difference Scheme 

Let u be the solution of initial-boundary value problem (2) and v - the solution 
of finite difference scheme (3). The error z = u — v satisfies the conditions 



Zt + LhZ = '^'q, + ip, in , 

i=l 

0 = 0, on w X {0} , z = 0 , on ^ X Or , 

where 

r]i = {Di{aiDiu)) - 5 {{aiUxi)xi + {aiUxi)xi) , and 

(p = Ut~ T^T^Ui ■ 

We define discrete inner products 

(?;, w)uj = {v, X! t)w{x, t) , 

xGio 

tee+ 

and discrete Sobolev norms: 

|2 _ ll„,l|2 



( 4 ) 






= {v,v)u;, IHIq^, = > Ml^ = (LhV,v)^ 









2 

Qhr 



2=1 2=1 
The following assertion holds true: 

Lemma 1. Finite- difference scheme (4) satisfies a priori estimate 



( 5 ) 



where ip = X]i=i 0i + ‘P ■ 
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In a such a way, the problem of deriving the convergence rate estimate for 
finite-difference scheme (3) is now reduced to estimating the right-hand side 
terms in (8). 
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First of all, we decompose term rji in the following way [3]: rji = X)I=i Vik , 
where 

77*1 = nnTf{a,Dju) - {T^TiT,-ai){TiTiT,-Dfu) , 

77*2 = - ai){T^TiT,-D^u) , 

ms = a,{TlT^Tf Dfu - , 

r]i4 = T‘lT^Tf{DiaiDiu) - {TlT^TfDiai){T‘lT^Tf Diu) , 
ms = {TiT^Tf Ditti - 0.5(a*,,c, + ai^xi)){TlT^Tir Diu) , 

77i6 = 0.5{ai,xi + ai^xi){T^T^Tf DiU - 0.5(ms, + u^i)) , 

T]i7 = 0.25(Oi,a;i - a*,$J(M$i -UxJ . 

Let us introduce the elementary rectangles e = e{x,t) = {(^17^2,^^) : ^* G 
{xi — h,Xi + h) , i = 1 , 2 , v € {t — T, t)}. The linear transformation = 
Xi + hx* , 7 = 1,2, V = t + rt* , defines a bijective mapping of the canonical 
rectangles E = {{x\,X2,t*) : \x*\ <1,7 = 1 , 2 , — l<t*< 0} onto e. We 
define u*{x* ,t*) = u*{x\,X2,t*) = u{x\ + hx\,X2 + hx2,t + rt*) , . 

The value of 77^1 at a mesh point {x, t) € can be expressed as 



77ii(x,t) = — •^ II k{x*i)k{x2)a*{x* ,t*)Dfu*{x* ,t*)dt*dx* 



E 






where k{x*) = l—\x*\. 

Thence we deduce that mi is a bounded bilinear functional of the argument 
{a*,u*) e Wq'^^‘^{E) X , where A > 0, 77 > 2 and q > 2 . 

Furthermore, mi = 0 whenever a* is a constant function or u* is a polynomial 
of degree two in x* and degree one in t*. Applying the bilinear version of the 
Bramble-Hilbert lemma [2], [8] we deduce that 



C 

\mi(x,t)\ < j^\a*\^x,x/2 



(B)' 






2q/{q-2) 



(E) 



, 0<A<1, 2 < q> 2 . 



Returning from the canonical variables to the original variables we obtain 

(E) — ^ 1^* I Q-Ilb 

_ 2(g-2) 



(E) — 

2q/(g-2)^^'’ 






2g/(g-2) 



(e) • 



Therefore, 



|r/*i(x,t)|<C/7^+^-4|a* 



v^/2(g) I'^l 0 — — 2 fi ^ 3 ^ q ^ 2 . 



Summing over the mesh we obtain 



' hr 

I'HiiWQhi- — 2)(Q 2 </r< 3 . ( 9 ) 
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Now suppose that 3 < s < 4 . Then the following Sobolev imbeddings hold: 

forA>4/(? and 

^A+M-i.(A+M-i)/2^g^ C , for /X > 3 - 4/g. 

Setting g = 4, A=l, /x = s — 1 in (9), using previous imbeddings, we obtain: 

h*i||Q^, < , 3 < s < 4. (10) 

In the case 2 < s < 3 , setting g = 4/(s — 2) , A = s — 2 , ^ = 2 in (9) and 
using imbeddings 

^a+m,(a+^)/ 2 ^^^ C <X-2)(Q) ’ ^ ^ 4/g and 

^ > for A < 4/g + £ , 

we have 



< Ch^ 2||q. 



>w 



s-l+e,(s-l + e)/2 



4/(s-l) 



(Q)' 



I w. 



3,8/1 



(Q) 



2 < s < 3 . 



( 11 ) 



The term rji^ is a bounded bilinear functional of the argument {a*,x*) G 
C{E) X , and 77*3 = 0 whenever u* is a polynomial of degree three 

in Xi and X 2 and degree one in t*. Recalling the Bramble-Hilbert lemma and 
imbeddings 

C C(Q) , for 2 < s < 3 and 
W'2”1’(®”1)/2(Q) C C(Q) , for 3 < s < 4 , 



we obtain estimates of the form ( 10 ) and ( 11 ) for rji^ . 

Using the same technique as before we obtain estimates of the form (10) and 
(11) for other terms rjik ■ In a such a way we have estimates: 



"ElCh II Oj II ^^s-l+e4s-l + e)/2^Q^ II ull 



4/(o-l) 



||?7*||q^, < C'/i® ||a 



I w. 



o-1,(o-1)/2jqj||W||^s 



s/2 



(Q) 



2 < s < 3, 

3 < s < 4. 



( 12 ) 

(13) 



Applying the linear version of the Bramble-Hilbert lemma we simply obtain 
estimate of the term (p : 

IIv^IIq^. < C'^"”^I|w|Ih/-,V 2 (q) , 2<s<4. (14) 



Combining ( 8 ) with (12)-(14) we obtain the final result: 

Theorem 1. The difference scheme (3) converges in the W^'^iQhr) norm, pro- 
vided Cl < T < C 2 . Furthermore, 



11^ '^11 IV 2 (Qfex) — '^lax II Ui II ^s^-l + c4s-l+e)/2 JQJ II u|| pj^s,s/ 2 jqj , 2 < s < 3, 



These estimates are consistent with the smoothness of the data. 
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Finite Volume Difference Scheme for a Stiff 
Elliptic Reaction-Diffusion Problem with a Line 

Interface 
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Abstract. We consider a singularly perturbed elliptic problem in two 
dimensions with stiff discontinuous coefficients of order 0(1) and 0(e) 
on the left and on the right of interface, respectively. The solution of 
this problem exhibits boundary and corner layers and is difficult to solve 
numerically. The FVM is implemented on condensed (Shishkin’s) mesh 
that resolves boundary and corners layers, and we prove that it yelds 
an accurate approximation of the solution both inside and outside these 
layers. We give error estimates in discrete energetic norm that hold true 
uniformly in the perturbation parameter e. Numerical experiments con- 
firm these theoretical results. 



1 Introduction 



Let consider the elliptic problem 



L u = -Au(x,y) + q(x,y)u(x,y) = f(x,y), (x,y) & n , 
L+m = -e^Au(x, y) + q(x, y)u(x, y) = f(x, y), (x, y) G 17+, 



( 1 ) 

(2) 



[u(x,y)]r = 0, L^u = -e 



r„,_ ^ 2 du (+ 0 ,y) du(-0,y) 

dx dx 



= A(y), yG (0,1), (3) 



w(a;,0) = ys(x), u(xA) = 9n{x), u(-l,y) = g^(y), m( 1, y) = ye(y), (4) 

f?- = (-1, 0) X (0, 1), f?+ = (0, 1) X (0, 1), r = 0 X (0, 1), 



where 

0 < go < ?(2;,y) < g°. (5) 

We suppose that all data in the problem are sufficiently smooth with possible 
discontinuity at the interface line F . 

Problems of type (l)-(5) often are called ’’stiff’, see [3]. It is well known the 
solution u of (l)-(5) has singularities at the corners of the square 17, [7], [2]. 
Since F is supposed to be regular, the solution can also have corner singularities 
only at the intersection points of the boundary 917 and the interface F . In order 
the solution to be sufficiently smooth some compatibility conditions should be 
fulfilled at this corners. Essential difficulties arise from the anisotropy of the 
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coefficients. Since the diffusion coefficients are small in 17+ boundary and corner 
layer appears around the boundary of 17+ . The interface conditions (3) on the 
right cause also weak corner singularities around two corners of 17“ laying on 
the interface F. 

Let /3 is a positive constant and (3 < We assume that the solution of 

the problem (l)-(5) satisfies the following assumptions 



Assumption 1 Let the eoefficients of problem (l)-(5) are sufficiently smooth 
and satisfy all necessary compatibility conditions. Then the solution u can be 
decomposed into regular part that satisfies for all l,m-integer, I < m, m = 
0, . . . , 3 the estimates 

\D^-^Dy{x,y)\ < Cm, {x,y) G 17“ U I7+, (6) 

and singular part Ug 



f (x,y) (x,y) G 17 

1 E^{x, y) + Ey{x, y) + E^y+{x, y), (x, y) G 17+ 



that satisfies for a// m = 0, . . . , 4, the estimates 

\Dff-''D'yE^y-{x, y) \ < (exp(-/3(a: + y)/e) + exp(-/3(l -y + x )/£)) , 

jD^~‘DyE^(x, y)| < (exp(—/3x/s) + exp(— /3(1 — x)/e)) , 

lD^-‘D‘^Ey(x,y)l < Ce~‘ (exp(-/3y/e) + exp(-/3(l - y)/e)) , 
jD^~‘D‘^E^y+(x,y)l < Ce~”^ (exp(-/3(x + y)/£r) + exp(-/3(l - x + y)/e) 

+ exp(-/3(l + x- y)/e) + exp(-/3(2 -x- y)/e)) (8) 



where C is independent of e constant. 



2 Numerical Solution 

2.1 Grid and Grid Functions 

It is well known that in singularly perturbed problems can not be achieved an 
uniform convergence on uniform mesh. In order to obtain er-imiformly convergent 
difference scheme we construct a partially uniform mesh w condensed closely to 
the boundary of 17+, see Fig. 1. Denote by the mesh in 17+ and by w~ the 
mesh in 17“. In 17+ we construct a condensed (Shishkin’s) mesh similar as this 
one introduced in [4]. 

= {{xi,yj), Xi = Xi-i + hi, yj = yj^i + hfi, j = 1, . . . , A 2 , 

i = Ni + l,...,Ni + N2 = N, XNi =0, XN = 1, 2/0 = 0, 2/^2 = 1} , 

“ ^i+Ni = hi = 4 S/N 2 , i, j = 1, . . . , A 2/4 U 1 + 3 A 2 / 4 , . . . ,N2, 
h) = h^,+N, = h 2 = {l- 25) /N 2 , i,j = l + A 2 / 4 , . . . , 3 A 2 / 4 , 

(5 = min {2eln A 2 // 3 , 1/4} . 
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Since in the left part the problem is not singularly perturbed we construct 
a coarse mesh in . Setting the conditions the mesh in 17+ to overlap this 

one in 17 , we chose N 2 = 2\ I > 2, ^-integer. Let sq = max s, s-integer, that 

satisfies /12 < <5/2®. Setting m = 2®“, we chose, Mi = 2m + N 2/2 and /13 = S/m. 
We also take = maxn, n-integer, that satisfies 1/n > / 12 , and ft .4 = 1/iVi. 
Then the mesh in 17“ is 

w~ = {{xi,yj), Xi = Xi-i + h/, yj = yj-i + h^.,i= 1, j = 1, 

xo = - 1 , XNi =0,yo = 0, yMi = 1 } , 

hf = hi, i= 1,.. . ,Ni, = h 2 , j = I + m, . . . , Mi - m, 

h^j = hs, j = I, . . . ,m U j = Ml — m + 1, . . . , Mi. 

For each point {xi, yj) of w we consider the rectangle etj = e{xi, yj), see Fig. 1,2. 
There are three types of grid points, boundary, regular and irregular, see Fig. 1. 




( 1 , 1 ) 



( 1 , 0 ) 



o-boundarypoints,x-regularpointsinw ,*-regularpointsinw +, 

•-regularinterfacepoints, A-irregularpointsin to “ ,o-irregularinterfacepoints. 



Fig. 1. Grid with local refinement on boundary and interior layers 



Let u,v,g are given grid functions of a discrete arguments (xi,yj) € w. 
Denote gij = g{xi,yj), = g{xi T 0,2/j)- Further we shall use the standard 

notations 

h/ + 



hf = 






hf 



S+i 



hy = 



-] ■ -L+i - 

9ij — 



Vii — Vi- 



i-lj 



1 



2 



2hf 



^x,ij — ^x,i-\-lj' 



We shall also use the following discrete scalar product 
ACi-lMi-l N-1 M 2 -I 

('u,'c)o,t(; = ^ ^ ^ ^ hjUijVij H“ ^ ^ ^ ^ hjUijVij ^ 
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ATi-lMi-l M 2 -I 

^ ^ ^ ^ hjUijVij ^ ^ h ]^\ j -\- 

i=l j=l j=l 

N M 2 -I 

+ XI £ e’^hfh^UijVij, 

i^Ni + 1 3 = 1 



N\ — l Ml M2 IX \ 

'^/V. “r ^ 



(u, ?;]y,e = X X + X 

Z^l j^l 

N-1 M2 

+ X 

i=Ni + l j=l 

and corresponding norms 



-r c 'tjVi + 1 ly 

h’^jUN^,jVNl,j 



(9) 



||'*^||o,'U) — {u, u)q^iu, ||rt]|a:,e — {u, u]x,e, ll'*^]ly,£ — (^i '^]y,e- (1^) 

2.2 Finite Difference Approximation 

Balance Equation Further in the numerical approximation we will use the 
balance equation corresponding to problem (l)-(5). Integrating the equations 
(1), (2) over sell e^- that does not interact the interface F we obtain 

/ W^ds= / {f{x,y)-q{x,y)u{x,y))dxdy, (11) 

J dein J Jen 



where 



VF, = -{v^y) {D^U, Dyu) , VFi = -P^D,, IF 2 = -p^Dy. 



where and p^ are the diffusion coefficients. 

Let now the rectangle e^- interacts the interface F, and e~- , are left and 
right part of respectively. Denote by Sp,ij the intersection of and F. Using 
the interface conditions (3), we obtain 

I Wj^ds + j W^ds = 

d de~./ Sr, ij Jdef./Sr.ij 

[ [ {f-qu)dxdy+ [ f {f - qu)dxdy + [ K{y)dy. (12) 
J J e. . J J ef. d Sr ij 



Approximation at the Regular Points At the regular points in w that does 
not lay on the interface we will use the standard approximations , see [6] . Using 
(11) at the regular points of w~ and w~^ we obtain 



P Fxx,ij -pyUyy ,ij + QijUij — fij- 
At the regular points on the interface using (12) we get 



(13) 



S Ux,Nij Ux,N\j 
ux 



hU + ,1 - Kn , , 

— — - — Uyy^N^j + dNijUN^j = fNij + (14) 

^^Ni ^Ni 
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Fig. 2a. typical cell 
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Fig. 2b. irregular grid points 



Fig. 2. 



Approximation at the Irregular Points At the irregular interface points 
(xi,yj), can happen that the point (xi-i,yj) is not a grid point (see Fig. 1, Fig. 
2b). Then we can not use the value of Ui-ij in the numerical approximation. 
The needed values could be obtained from the values at the coarse grid points 
on the left by piecewise polynomial interpolation. Below we shall use a piecewise 
constant interpolation. For ease of exposition consider the particular situation 
of Fig. 2b. From the figure we see that there are possible two cases. 

1. The cell Cij+i coincides with only one cell {I = —1,0, 1). We shall suppose 
that the grid functions gij is extended over neighboring cell e^-ij as a con- 
stant and we approximate Ui-ij+i by Ut-ij . 

2. The cell e^+i, {I = —2, -1-2) coincides with two cells. Consider the case I = 2. 
We shall suppose that the grid functions gij is extended over neighboring 
cells Ci-ij and ei_ij +4 as a constant. We use the approximation C/i_ij +2 = 

{Ui-ij + Ui-ij+i) /2. 



Denote 






- Un,- 



ijk) 



k=l 






where rrij = 1,2 is the number of neighboring cells (on the left) of cell cniJ 
and is the length of laying on the side Then at the irregular 

interface points we obtain the approximation (14) with VxUnij instead of Ux,Nj- 
In order to obtain the approximation at the irregular points in w~ we set the 
requirements that the finite difference scheme conserves the mass. For example, 
in the particular situation of Fig. 2b we have 




rVj-3/2 

'Vj-2 



1 

Widy+ 

k=-l 




Widy + 




( 15 ) 



HHUn,,, - Un,-u) 



Denote 
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where rij = 1, 2 is the number of neighboring cells (on the right) of the cell cni-ij 
and Hj^ is the length of laying on the side Then at the irregu- 

lar points {xNi-i,yj) we obtain the approximation (13) with ^xU^ij instead 

of Ux,Nj- 

Formulation of the Discrete Problem Setting the boundary conditions 

UiO — 9s,i: UiMi — UiM'z — 9n,i^ — 9w^j^ — 9e,j: (^^) 

we obtain the finite difference problem (FDP) (13), (14), (16). The FDP can be 
written as a system of linear algebraic equations 

Au = F, (xi,yj) e w, (17) 

where in the right hand side F we have taken boundary conditions (16) into 
account. The following lemma shows that the matrix A is symmetric and positive 
definite and therefore invertible and problem (17) has unique solution. 

Lemma 1. The matrix A in (17) is symmetric and positive definite in the scalar 
product (., .)o,uj and for arbitrary discrete functions U,V on w satisfying zero 
boundary conditions (16) holds 

(U, V)a = {AU, U)o,n, = {VxU, VsP],,, + {Uy, Vy]y,e T {QU, V)o,u,- (18) 

Here Q is a diagonal matrix corresponding to q. 

Since the matrix A is symmetric and positive definite, then it defines a norm 
called the energy norm. 

\\u\\a = {Au,u)l^. (19) 

The matrix A is badly scaled, see Table 2 below. The condition number pA 
of A tends to oo when e — > 0. But simple diagonal preconditioning improves the 
situation significantly. Denote 

D — dtag)dii) , da — ^a * 

Then the condition number of the matrix DA is independent of e, see [5] . 



Uniform Convergence Next theorem presents the main result in the paper. 

Theorem 1. Letu G (7^(12“) UC'^( 17+) U (7(17+) is a solution of the differential 
problem (l)-(5) and satisfies Assumption 1. Let U is a solution of the discrete 
problem (17). Then the following e-uniformly estimates holds 

\\U-u\\a<c(^N---), ( 20 ) 

and if e = 0{N~^ In N) 



\\U-u\\a <C{N-^lnN) 



( 21 ) 



Finite Volume Difference Scheme 



123 



Let in addition u € ) U and satisfies the Assumtion 1. Then, if 

£ = the following estimates hold 

\\U-u\\A<c(^N-hnNy ||C/-u||oo.®<c(A^-hn7v), (22) 

for some positive constant C independent of the mesh and e. 

3 Numerical Results 

Consider the problem (l)-(5) with coefficients 

q{x,y) = 2, {x,y) S fi~ , q{x,y) = 1, (x,y) € C+. 

We took the right had side and the boundary conditions so that the exact solution 
to be 

r 2 + X - 2xy, (x,y) G 17“ , 

u(x, y) = < l + exp(—y/e)+exp(—x/e)+ 

i-exp(-(x + y)/e)-exp(-(l + x-y)/£), (x,y) G 17+. 



Table 1. Error on Shishkin mesh 



W\e 


e = 1 


e = 10 “" 


e = 10 “^ 


e = 10 “'' 


e = 10 “^* 


e = 10 “" 


e = 10 ““ 


A ^2 =4, ||.||oo 


3.879e-4 


6.746e-2 


8.929e-2 


1.143e-l 


1.169e-l 


1.172e-l 


1.172e-l 


W = 8 , ||.||oo 


1.054e-4 


2.437e-2 


5.353e-2 


5.246e-2 


5.234e-2 


5.233e-2 


5.233e-2 


W = 16, ||.||oo 


2.683e-5 


6.765e-3 


3.044e-2 


3.043e-2 


3.042e-2 


3.042e-2 


3.042e-2 


N 2 = 32, ||.||<x) 


6.736e-6 


1.737e-3 


1.286e-2 


1.286e-2 


1.286e-2 


1.286e-2 


1.286e-2 


N 2 = 64, ||.||oo 


1 . 686 e -6 


4.375e-4 


4.843e-3 


4.844e-3 


4.844e-3 


4.844e-3 


4.844e-3 


II 


8.606e-4 


4.633e-2 


4.651e-2 


5.913e-2 


6.043e-2 


6.056e-2 


6.058e-2 


06 ' 

II 


2.400e-4 


1.912e-2 


1.149e-2 


1.027e-2 


1.029e-2 


1.029e-2 


1.029e-2 


N 2 — 16, ||.||a 


6.206e-5 


5.628e-3 


6.988e-3 


2.734e-3 


1.879e-3 


1.772e-3 


1.761e-3 


N 2 = 32, ||.||a 


1.570e-5 


1.471e-3 


3.416e-3 


1.087e-3 


4.494e-4 


3.225e-4 


3.069e-4 


N 2 — 64, |j.||yi 


3.932e-6 


3.768e-4 


1.308e-3 


4.163e-4 


1.411e-4 


6.750e-5 


5.498e-5 



First we investigate the convergence rate . Table 1 shows the maximum and 
energetic norm of the error on Shishkin mesh. The results in the table confirm 
the theoretical ones. They show that there is a convergence in maximum norm 
too. Approximate solution and error in the case £ = 0.01 and = 16 are shown 
on Fig. 3. Table 2 gives the condition number of matrixes A and DA where D 
is a diagonal preconditioning matrix defined in Section 2. We can see from the 
table that this simple diagonal preconditioning improves the condition number 
significantly and it becomes independent of £. 
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Fig. 3. Approximate solution and error 
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Abstract. We describe the implementation and performance of a novel 
fill-minimization ordering technique for sparse LU factorization with par- 
tial pivoting. The technique was proposed by Gilbert and Schreiber in 
1980 but never implemented and tested. Like other techniques for or- 
dering sparse matrices for LU with partial pivoting, our new method 
preorders the columns of the matrix (the row permutation is chosen 
by the pivoting sequence during the numerical factorization). Also like 
other methods, the column permutation Q that we select is a permuta- 
tion that minimizes the fill in the Cholesky factor of Q^A^AQ. Unlike 
existing column-ordering techniques, which all rely on minimum- degree 
heuristics, our new method is based on a nested-dissection ordering of 
A^A. Our algorithm, however, never computes a representation of A^ A, 
which can be expensive. We only work with a representation of A it- 
self. Our experiments demonstrate that the method is efficient and that 
it can reduce fill significanly relative to the best existing methods. The 
method reduces the LU running time on some very large matrices (tens 
of millions of nonzeros in the factors) by more than a factor of 2. 



1 Introduction 

Reordering the columns of sparse nonsymmetric matrices can significantly reduce 
fill in sparse LU factorizations with partial pivoting. Reducing fill in a factor- 
ization reduces the amount of memory required to store the factors, the amount 
of work in the factorization, and the amount of work in subsequent triangular 
solves. Symmetric positive definite matrices, which can be factored without piv- 
oting, are normally reordered to reduce fill by applying the same permutation 
to both the rows and columns of the matrix. When partial pivoting is required 
for maintaining numerical stability, however, pre-permuting the rows is mean- 
ingless, since the rows are exchanged again during the factorization. Therefore, 
we normally preorder the columns and let numerical consideration dictate the 
row ordering. Since columns are reordered before the row permutation is known, 
we need to order the columns such that fill is minimized no matter how rows 
are exchanged. (Some nonsymmetric factorization codes that employ pivoting, 
such as UMFPACK/MA38 [2,3], determine the column permutation during the 
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numerical factorization; such codes do not preorder columns so the technique in 
this paper does not apply to them.) 

A result by George and Ng [6] suggests one effective way to preorder the 
columns to reduce fill. They have shown that the fill of the LU factors of PA 
is essentially contained in the fill of the Cholesky factor of A for every row 
permutation P. (P is a permutation matrix that permutes the rows of A and 
represents the actions of partial pivoting.) Gilbert [8] later showed that this 
upper bound on the fill of the LU factors is not too loose, in the sense that for 
a large class of matrices, for every fill element in the Gholesky factor of A^A 
there is a pivoting sequence P that causes the element to fill in the LU factors 
of A. Thus, nonsymmetric direct sparse solvers often preorder the columns of A 
using a permutation Q that minimizes fill in the Gholesky factor of 

The main challenge in column-ordering algorithms is to find a fill-minimizing 
permutation without computing A^A or even its nonzero structure. While com- 
puting the nonzero structure of A^A allows us to use existing symmetric ordering 
algorithms and codes, it may be grossly inefficient. For example, when an n-by- 
n matrix A has nonzeros only in the first row and along the main diagonal, 
computing A^A takes i7(n^) work, but factoring it takes only 0{n) work. 

This challenge has been met for the class of reordering algorithms based 
on the minimum-degree heuristic. Modern implementations of minimum-degree 
heuristics use a clique-cover to represent the graph Ga of the matrix^ A (see [5]). 
A clique cover represents the edges of the graph (the nonzeros in the matrix) as 
a union of cliques, or complete subgraphs. The clique-cover representation allows 
us to simulate the elimination process with a data structure that only shrinks 
and never grows. There are two ways to initialize the clique-cover representation 
of Gat A directly from the structure of A. Both ways create a data structure 
whose size is proportional to the number of nonzeros in A, not the number of 
nonzeros in A^A. From then on, the data structure only shrinks, so it remains 
small even if A^A is relatively dense. In other words, finding a minimum-degree 
column ordering for A requires about the same amount of work and memory as 
finding a symmetric ordering for A^ -|- A, the symmetric completion of A. 

Nested-dissection ordering methods were proposed in the early 1970’s and 
have been known since then to be theoretically superior to minimum-degree 
methods for important classes of sparse symmetric definite matrices. Only in 
the last few years, however, have nested-dissection methods been shown experi- 
mentally to be more effective than minimum-degree methods. 

In 1980 Gilbert and Schreiber proposed a method for ordering Gat a using 
nested-dissection heuristics, without ever forming A^A [7,9]. Their method uses 
wide separators, a term that they coined. They have never implemented or tested 
their proposed method. 

The main contribution of this paper is an implementation and an exper- 
imental evaluation of the wide-separator ordering method, along with a new 
presentation of the theory of wide separators. 



^ The graph Ga = (V, E) of an n-hy-n matrix A has a vertex set v = {1,2, ... ,n} and 
an edge set E = {{i,j)\aij A 0}- We ignore numerical cancellations in this paper. 
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Modern symmetric ordering methods generally work as follows: 

1. The methods find a small vertex separator that separates the graph G into 
two subgraphs with roughly the same size. 

2. Each subgraph is dissected recursively, until each subgraph is fairly small 
(typically several hundred vertices). 

3. The separators are used to impose a coarse ordering. The vertices in the 
top-level separator are ordered last, the vertices in the second-to-top level 
come before them, and so on. The vertices in the small subgraphs that are 
not dissected any further appear first in the ordering. The ordering within 
each separator and the ordering within each subgraph has not yet been 
determined. 

4. A minimum-degree algorithm computes the final ordering, subject to the 
coarse ordering constraints. 

While there are many variants, most codes use this overall framework. 

Our methods apply the same framework to the graph of A, but without 
computing it. We find separators in A'^A by finding wide separators in + A. 
We find a wide separator by finding a conventional vertex separator and widening 
it by adding to it all the vertices that are adjacent to the separator in one of the 
subgraphs. Such a wide separator corresponds to a vertex separator in A'^A. Just 
like symmetric methods, our methods recursively dissect the graph, but using 
wide separators. When the remaining subgraphs are sufficiently small, we com- 
pute the final ordering using a constrained column-minimum-degree algorithm. 
We use existing techniques to produce a minimum-degree ordering of A^ A with- 
out computing G^t a (either the row-clique method or the augmented-matrix 
method) . 

Experimental results show that our method can reduce the work in the LU 
factorization by up to a factor of 3 compared to state-of-the-art column-ordering 
codes. The running times of our method are higher than the running-times of 
strict minimum-degree codes, such as COLAMD [10], but they are low enough 
to easily justify using the new method. On many matrices, including large ones, 
our method significanly reduces the work compared to all the existing column 
ordering methods. On some matrices, however, constraining the ordering using 
wide-separators increase fill rather than reduce it. 

The rest of the paper is organized as follows. Section 2 presents the theory 
of wide separators and algorithms for finding them. Our experimental results 
are presented in Section 3. We discuss our conclusions from this research in 
Section 4. 

2 Wide Separators: Theory and Algorithms 

Our column-ordering methods find separators in Gat a by finding a so-called 
wide separator in Gat+a- We work with the graph of A^ + A and not with Ga 
for two reasons. First, this simplifies the definitions and proofs. Second, to the 
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best of our knowledge all existing vertex-separator codes work with undirected 
graphs, so there is no point in developping the theory for the directed graph Ga- 

A vertex subset S' C 1 / of an undirected graph G = (V, E) is a separator if 
the removal of S and its incident edges breaks the graph into two components 
Gi = (Vi,Ei) and G2 = (V2, E2), such that any path between i & Vi and j & V2 
passes through at least one vertex in S. A vertex set is a wide separator if every 
path between i € Vi and j G V2 passes through a sequence of two vertices in S 
(one after the other along the path). 

Our first task is to show that every wide separator in Gat+a is a separator 
in Gat A- (proofs are omitted from this abstract due to lack of space) 

Theorem 1. A wide separator in Gat+a is a separator in Gata- 

The converse is not always true. There are matrices with separators in Gata 
that do not correspond to wide separators in A^+A. The converse of the theorem 
is true, however, when there are no zeros on the main diagonal of A: 

Theorem 2. If there are no zeros on the diagonal of A, then a separator in 
Gata is a wide separator in Gat+a- 

Given a code that finds conventional separators in an undirected graph, find- 
ing wide separators is easy. The separator and its neighbors in either G\ or G2 
form a wide separator: 

Lemma 1. Let S he a separator in an undirected graph G. The sets Si = 
S U {i\i € Vi,{i,j) € E for some j G S'} and S2 = S U {i\i G V2,{i,j) G 
E for some j G S| are wide separators in G. 

The proof of the theorem is trivial. The sizes of Si and S2 are bounded 
by d|S|, where d is the maximum degree of vertices in S. Given S, it is easy to 
enumerate Si and S2 in time 0 (d|S|). This running time is typically insignificant 
compared to the time it takes to find S. 

Which one of the two candidate wide separators should we choose? A wide 
separator that is small and that dissects the graph evenly reduces fill in the 
Gholesky factor of A, and hence in the LU factors of A. The two criteria 
are usually contradictory. Over the years it has been determined the the best 
strategy is to choose a separator that is as small as possible, as long as the ratio 
of the number of vertices in Gi and G2 does not exceed 2 or so. 

The following method, therefore, is a reasonable way to find a wide separator: 
Select the smallest of Si and S'2, unless the smaller wide separator unbalances 
the separated subgraphs (so that one is more than twice as large as the other) 
but the larger does not. Our code, however, is currently more naive and always 
choose the smaller wide separator. 

3 Experimental Results 

3.1 Experimental Setup 

The experiments that this section describe test the effectiveness and perfor- 
mance of several column-ordering codes. We have tested our new codes, which 
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implement nested-dissection-based orderings, as well as several existing ordering 
codes. 

Our codes build a hierarchy of wide separators and then use the separators 
to constrain a minimum-degree algorithm. We obtain the wide separators by 
widening separators in Gat+a that SPOOLES [1] finds. SPOOLES is a new 
library of sparse ordering and factorization codes that is being developped by 
Cleve Ashcraft and others. Our codes then invoke a column-mininum-degree 
code to produce the final ordering. One minimum-degree code that we use is 
SPOOLES’s multi-stage-minimum-degree (MSMD) algorithm, which we run on 
the augmented matrix. The other minimum-degree code that we used is a version 
of COL AMD [10] that we modified to respect the constraints imposed by the 
separators. 

The existing minimum-degree codes that we have tested include COL AMD, 
SPOOLES’s MSMD (operating on the augmented matrix with no separator con- 
straints), and COLMMD, a column minimum-degree code, originally written by 
Joseph W.-H. Liu and distributed with SuperLU. 

We use the following acronims to refer to the ordering methods: MSMD 
refers to SPOOLES’ minimum-degree code operating on the augmented ma- 
trix without constraints, WS-I-MSMD refers to the same minimum-degree code 
but constrained to respect wide separators, and similarly for COL AMD and 
WS-kCOLAMD. 

In one set of experiments we first reduced the matrices to block triangular 
form (see [12]) applied the ordering and factorization to the diagonal blocks in 
the reduced form. 

We always factor the reordered matrix using SuperLU [4,11], a state-of- 
the-art sparse-LU-with-partial-pivoting code. SuperLU uses the BLAS; we used 
the standard Fortran BLAS for the experiments. We plan to use a higher- 
performance implementation of the BLAS for the final version of the paper. 

We conducted the experiments on a 500MHz dual Pentium III computer with 
1 GByte of main memory running Linux. This machine has two processors, but 
our code only uses one processor. 

We tested the ordering methods on a set of nonsymmetric sparse matri- 
ces from Tim Davis’s sparse matrix collection^. We used all the nonsymmetric 
matrices in Davis’s collection that were not too small (less than 0.1 second fac- 
torization time with one of the ordering methods) and that did not require more 
than IGbytes to factor. The matrices are listed in Table 1. For further details 
about the matrices, see Davis’s web site (the final version of this paper will in- 
clude a table listing the order and number of nonzeros for each matrix; the table 
is omitted from this abstract due to lack of space). 

3.2 Results and Analysis 

Table 1 summarizes the results of our experiments. The table shows experiments 
without reduction to block triangular form. 

http : //www. cise .uf 1 . edu/~davis/sparse/ 
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Columns 2-9 in the table show that wide-separator ordering techniques are 
effective. Wide separator (WS) orderings are the most effective ordering meth- 
ods, in terms of work in the factorization, on 23 out of the 41 test matrices. WS 
orderings are the most effective on 9 out of the 10 largest matrices (largest in 
terms of work in the factorization). On the single matrix out of the 10 largest 
where a WS ordering was not the best, it required only 7% more flops to factor. 

The reduction in work due to wide separators is often significant. On the 
larget matrix in our test suite, li, wide separators reduce factorization work by 
almost a factor of 2. The reduction compared to the unconstrained MD methods 
is also highly significant on raefskyS, epb3, and grahaml. 

When WS orderings do poorly compared to MD methods, however, they 
sometimes do significantly poorer. On ex40, for example, using wide separa- 
tors requires 2.66 times the number of flops that COLAMD alone requires. The 
slowdowns on some of the Ihr and bayer matrices are even more dramatic, but 
reduction to block triangular form often resolves these problems. 

On lhrl4c, for example, reduction to block triangular form prior to the 
ordering and factorization reduced the ordering time by more than a factor of 
10 and reduced the number of nonzeros in MSMD-I-WS from 2.1e9 to 8.2e7 (and 
to 4.5e7 for MSMD alone). These experiments are not reported here in detail 
because we conducted them too late. The complete results will appear in the 
final version of the paper. 

As columns 7-9 in the table show, reducing flop counts generally translates 
into reducing the running time of the factorization algorithm and reducing the 
size of the LU factors. The detailed comparisons between ordering methods other 
than COLAMD and WS-I-COLAMD are similar and are omitted from the table. 
Hence, our remarks concerning the flop counts above also apply to the running 
time of the factorization code and the amount of memory required to carry out 
the factorization and to store the factors. 

Wide-separator orderings are more expensive to compute than strict mini- 
mum-degree orderings, but the extra cost is typically small compared to the 
subsequent factorization time. Column 10 in the table shows the cost of ordering 
relative to the cost of the factorization. The table shows that a few matrices take 
longer (sometimes much longer) to order than to factor. This happens to matrices 
that arise in chemical engineering (the bayer matrices and the Ihr matrices). 
We hope to resolve this issue using reduction to block tridiagonal form. Another 
point that emerges from the table is that on small matrices, wide-separator 
orderings are expensive to compute relative to the cost of the factorization. 

4 Conclusions and Future Work 

Our main conclusion from this research is that hybrid wide-separator/minimum- 
degree column orderings are effective and inexpensive to compute. They often 
reduce substantially the amount of time and storage required to factor a sparse 
matrix with partial pivoting, compared to minimum-degree orderings such as 
COLAMD and COLMMD. They are more expensive to compute than minimum- 
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Table 1. A comparison of wide-separator and mimimum-degree column order- 
ings. Columns 2-6 show the number of floating-point operations (flops) required 
to factor the test matrices using 5 different ordering methods. The flop counts for 
the most efficient method (or methods) are printed in bold. Columns 7-9 show 
the effectiveness of WS-I-COLAMD relative to that of COLAMD: %t compares 
factorization running times (< 100 means that WS-I-COAMD is better), %p 
compares flops, and %z compares number of nonzeros in the factors. The last 
column, denoted %o, show the time to And wide-separators as a percentage of 
the WS-hCOLAMD factorization time 



Name 


MSMD 


WS-I- 

MSMD 


COLMMD 


COLAMD 


WS-I- 

COLAMD 


%T 


%F 


%z 


%o 


bwm2000 


2.75E-I-04 


7.86E-I-04 


2.75E-I-04 


2.86E-|-04| 


2.83E-I-04 


200 


98 


98 


100 


cavity04 


9.57E-I-05 


9.57E-I-05 


1.30E-I-06 


6.37E-I-05 


6.37E-I-05 


100 


100 


100 


0 


poliJarge 


1.45E-I-05 


1.45E-I-05 


1.65E-I-05 


1.70E-I-05 


1.70E-I-05 


100 


100 




37 


bayerlO 


1.04E-|-07| 


3.01E-I-07 


1.24E-I-07 


l.OlE-l-07 


1.45E-I-07 


125 


143 




2040 


lhr04c 


1.47E-|-0r 


6.73E-I-07 


1.68E-I-07 


1.77E-|-07| 


3.25E-I-07 


164 


183 


128 


150 


bayer02 


1.09E-I-07 


1.09E-I-07 


9.72E-I-06 


9.28E-I-06 


9.28E-I-06 


117 


100 




362 


rw5151 


3.12E-I-07 


3.16E-I-07 


3.29E-I-07 


3.29E-I-07 


2.92E-I-07 


92 


88 


92 


37 


lhr07c 


2.78E-I-07 


2.22E-I-08 


3.16E-I-07 


3.06E-I-07 


6.67E-I-07 


196 


217 


135 


132 


bayer04 


2.79E-I-07 


2.79E-I-07 


2.41E-I-07 


2.51E-I-07 


2.51E-I-07 


100 


100 




447 


IhrlOc 


3.72E-I-07 


3.32E-I-08 


3.98E-I-07 


3.92E-I-07 


1.31E-I-08 


197 


334 


152 


533 


Ihrllc 


4.77E-I-07 


4.77E-I-07 


5.18E-I-07 


5.22E-I-07 


5.22E-I-07 


116 


100 


100 


343 


memplus 


3.95E-I-07 


3.95E-I-07 


4.01E-I-07 


5.60E-I-09 


5.60E-I-09 


94 


100 


100 


0 


exl9 


9.45E-I-07 


1.12E-I-08 


7.08E-I-07 


4.07E-I-07 


1.09E-I-08 


230 


267 


151 


83 


lhrl4c 


8.68E-I-07 


2.10E-I-09 


8.46E-I-07 


8.51E-|-07| 


2.60E-I-08 


191 


305 


149 


284 


bayerOl 


6.12E-I-07 


4.82E-I-08 


6.47E-I-07 


4.76E-I-07 


l.llE-l-08 


121 


233 




8857 


ex35 


1.03E-I-08 


1.33E-I-08 


9.25E-I-07 


5.65E-I-07 


1.38E-I-08 


207 


244 


136 


34 


cavity26 


1.77E-I-08 


1.39E-I-08 


1.71E-I-08 


2.04E-I-08 


1.48E-I-08 


75 


72 


85 


19 


epbl 


1.47E-I-08 


1.22E-I-08 


1.02E-I-08 


1.43E-I-08 


1.25E-I-08 


116 


87 


95 


27 


goodwill 


6.42E-I-08 


5.77E-I-08 


5.06E-I-08 


1.91E-I-09 


6.44E-I-08 


34 


33 


57 


15 


epb2 


7.14E-I-08 


5.17E-I-08 


7.14E-I-08 


6.46E-I-08 


5.64E-I-08 


107 


87 


97 


15 


garon2 


1.18E-I-09 


1.20E-I-09 


1.28E-I-09 


1.06E-I-09 


1.98E-I-09 


184 


186 


119 


5 


shyylbl 


1.07E-I-09 


9.00E-I-08 


1.04E-I-09 


1.03E-I-09 


7.56E-I-08 


77 


73 


92 


34 


grahaml 


1.69E-I-09 


9.24E-I-08 


1.42E-I-09 


1.33E-I-09 


9.54E-I-08 


72 


71 


82 


11 


epb3 


2.22E-I-09 


8.09E-I-08 


1.79E-I-09 


2.06E-I-09 


1.18E-I-09 


77 


57 


83 


27 


olafu 


3.16E-I-09 


2.71E-I-09 


2.96E-I-09 


2.84E-I-09 


2.58E-I-09 


73 


90 


89 


21 


rim 


2.89E-I-09 


2.01E-I-09 


2.12E-I-09 


5.55E-I-09 


1.77E-I-09 


31 


31 


54 


28 


venkatSO 


4.30E-I-09 


4.36E-I-09 


5.84E-I-09 


4.51E-I-09 


4.91E-I-09 


93 


108 


85 


13 


venkat25 


4.30E-I-09 


4.36E-I-09 


5.84E-I-09 


4.51E-I-09 


4.91E-I-09 


94 


108 


85 


14 


venkatOl 


4.30E-I-09 


4.36E-I-09 


5.79E-I-09 


4.46E-I-09 


4.87E-I-09 


93 


109 


85 


14 


ex40 


3.69E-I-09 


3.39E-I-09 


2.29E-I-09 


1.08E-I-09 


2.87E-I-09 


268 


265 


146 


8 


af23560 


5.33E-I-09 


7.01E-I-09 


4.95E-I-09 


4.52E-I-09 


9.50E-I-09 


181 


210 


133 


1 


raefsky3 


1.05E-I-10 


5.24E-I-09 


7.75E-I-09 


1.04E-I-10 


5.47E-I-09 


45 


52 


64 


11 


exll 


1.55E-I-10 


1.19E-|-10| 


1.43E-I-10 


1.19E-I-10 


1.12E-I-10 


76 


94 


92 


6 


raefsky4 


1.56E-I-10 


7.80E-I-09 


1.07E-I-10 


l.lOE-l-10 


8.56E-I-09 


62 


77 


80 


9 


psmigr_l 


1.48E-I-10 


1.48E-I-10 


1.66E-I-10 


1.68E-I-10 


1.68E-I-10 


94 


100 


100 


0 


psmigr^ 


1.58E-I-10 


1.58E-I-10 


1.72E-I-10 


1.74E-I-10 


1.74E-I-10 


95 


100 


100 


0 


psmigr_2 


1.56E-I-10 


1.56E-I-10 


1.74E-I-10 


1.76E-I-10 


1.76E-I-10 


94 


100 


100 


0 


wang3 


3.12E-I-10 


1.55E-I-10 


3.47E-I-10 


2.78E-I-10 


2.45E-I-10 


84 


88 


90 


0 


wang4 


3.70E-I-10 


2.45E-I-10 


3.52E-I-10 


3.37E-I-10 


2.72E-I-10 


81 


80 


89 


0 


bbmat 


5.97E-I-10 


4.77E-|-10| 


4.46E-I-10 


4.46E-I-10 


5.82E-I-10 


109 


130 


112 


4 


li 


1.59E-I-11 


8.10E-I-10 


2.17E-I-11 


1.63E-I-11 


8.15E-I-10 


44 


50 


72 


4 
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degree orderings, but the cost is typically small relative to the cost of the sub- 
sequent factorization. 

The use of the block triangular decomposition of the matrices and ordering 
seems to resolve the problems with some of the chemical engineering problems, 
but we are still investigating this issue. 
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Abstract. In this paper we obtain a unconditional convergence result 
for discretization methods of type Fractional Steps Runge-Kutta, which 
are highly efficient in the numerical resolution of parabolic problems 
whose coefficients depend on time. These methods combined with stan- 
dard spatial discretizations will provide totally discrete algorithms with 
low computational cost and high order of accuracy in time. We will show 
the efficiency of such methods, in combination with upwind difference 
schemes on special meshes, to integrate numerically singularly perturbed 
evolutionary convection-diffusion problems. 



1 Introduction 

Let u{x, t) be the solution of a two-dimensional space evolution problem {x £ IR^) 
which admits an operational formulation in the form 

U(0) = tto, 

where A{t) : T) C H — s- H, t £ [0,T], are unbounded linear operators, in 
a Hilbert space H, with scalar product ((•,•)) associated norm || • ||, of 
functions defined in a domain I? C ]R^. Let us also suppose that A{t) admits a 
natural decomposition in two simpler addends, Ai{t), A 2 {t), such that Ai{t) : 
T>i C H — > H, for i = 1,2, where n I ?2 = T> and Ai{t) for i = 1,2 are 
maximal and coercive operators for all t £ [0,T], i.e. 

(yf£H,3v£ Vi, such that v + Ai{t)v = f and 
3 Qfi >0 such that {{Ai{t)v, v)) > ai||r>p, W v £ Vi. 

In convection-diffusion problems, we will consider the case 

A{t) = Ai{t) + A 2 {t) with 

d 

Ai{t) = -di{xi,X 2 ,t)-^ -b Vi{xi,X 2 ,t)^ + ki{xi,X 2 ,t), 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 133—143, 2001. 
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with di{xi,X 2 ,t) > do > 0 and ki{xi,X 2 , t) > 0, i = 1, 2. 

In this paper we will show the advantages of Fractional Step Runge Kutta 
methods (abbreviately FSRK) to discretize the time variable joined to a stan- 
dard spatial discretization via Finite Difference or Finite Element schemes. For 
simplicity we shall choose a simple upwind scheme on rectangular meshes to 
discretize the spatial variables of (1) (2) obtaining a totally discrete scheme of 
type: 



' = Ur + At aji ( - + gih{tm,i)) , 

2 



S 

= Ur + ^tJ2 C/r’* + gnmitm,i)) , 



with 



i{i) = 1, if * is even, 
i{i) = 2, if * is odd, 



(2) 



here, we are denoting with h the mesh size and [-]/j is the restriction of a function 
defined in 17, to a rectangular mesh f2h that covers 17. Using these algorithms we 
obtain approximations C/™ to [u{x,tm)]h with tm = mAt. Aih{t) and A2h{t) are 
the difference operators that appear by discretizing, with the upwind scheme, the 
operators A\{t) and ^ 2 ( 1 ) given in (2). The intermediate approximations U™’* 
for i = 1, . . . , s are called stages values of the FSRK method, and we can con- 
sider them as approximations to [u{xAm,i)]h in lm,i = tm + Ci At. Finally we 
take gn{i)h{tm,i) = [gnii){tni,i)]h with gi{t) -f 52 ( 1 ) = g{t). 

So, a FSRK method is determined by the choise of the coefficients Cj, 

We will refer often to a FSRK method by means of its coefficients, sorted 
in a Butcher’s table like follows: 



Ce 








(bY 


{bY 









^ Cl \ 


where e = 


(:) 




1 Y 






b^ = {b^)Uk=l,2, 

' a% = 0, if i > j, fc € {1,2}, 



verifying < 



o' 

A 


3-n(i) _ g 


Vze {!,.. 






. .,s|, 






V*e|i,... 


,s|. 





(3) 
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For solving (2) we must attack a family of linear systems of the form (/ + 
At a An(i)h{tm,i)) = fih, where the second term, fih, is computed explicitly 
from previous stages and some evaluations of gih{t), and where the matrices 
(/+Z\f a are tridiagonal. Therefore the computational cost resulting 

of the numerical integration with these methods is, in every time step, linearly 
dependent on the number of mesh points in 17^. This fact represents an important 
advantage with respect to classical implicit methods because it is possible to 
obtain unconditional convergence (i.e. without limitations between At and h) 
and the order of complexity of the algorithm is the same of an explicit method, 
while the classical implicit methods have a higher computational cost due to 
they have to resolve block tridiagonal linear systems. 

It is well known that the time variation of the coefficients di, Vi and ki 
hampers the analysis of the unconditional convergence when we use a numer- 
ical integrator in a parabolic problem (see [3], [4]). In fact, the use of a space 
semidiscretization in a parabolic problem results in a Stiff problem with the form 

( u'{t) = J{t)u{t) + f{t), , . 

\u{0) = uo; 

classically, methods that verify the AN-stability property have been used in 
the time integration of (4) in order to preserve the contractivity of their exact 
solutions independently of At and h. This property is preserved by simple inte- 
grators of low order (implicit Euler) or by high order Runge-Kutta methods if 
they are totally implicit (like Gauss methods), i.e., to obtain high orders in time 
with classical methods preserving AN-stability increases the computational cost 
because we must use non semi explicit methods (see [5]). 

Nevertheless, recent papers (see [4]) show that if we consider time variation 
of the form 

||(A(t) — A(s))m|| < L\t — s|“(||m|| -I- ||A(s)u||), u G T>, a g (0, 1], and L > 0, 

the AN-stability can be weakened to A-stability obtaining unconditional conver- 
gence for standard time integrators, like Runge-Kutta methods. 

Since the FSRK methods which we propose are semiexplicit, the AN-stability 
will be preserved only by the simplest methods of low order (see [7]). 

For the discretizations obtained via FSRK methods we give a stability result 
by imposing on the operators Ai(t) the next condition: 

\\A,{t')u-A,{t)u\\<\t-t'\A'h\\A,{t)ul Vi = 1,2, Vt,t'G[0,T], (5) 

which is related to a Lipschitz variation in the coefficients of A,; (t) . In this case, 
the A-stability is a sufficient condition to guarantee the unconditional stability 
of scheme (2), at least in finite time intervals. 

2 Convergence 

To realize the study of the convergence in a simple form we decompose the 
analysis of global error in two components: on one hand, the contribution of the 
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time semidiscretization process and on the other one, the contribution of the 
space semidiscretization stage. 



2.1 Time Semidiscretization 

Let [/™ = U™‘{x) be the solution, that approaches u{x,tm), obtained with the 
scheme 

'U° = uoGV, 

Um,l = [/m + C/™’^ +5l(tm.l)), 



= U^ + AtJ2 ( - ^n(^) + 5„(.) , 

2=1 
S 

jjm+l + bf^ ( - C/™’* + 9ni^) , 



(6) 



where the coefficients Cj, verify the restrictions (3). 

In order to study the scheme (6) we introduce the next tensorial notation: 
given M = (rrnj) G we define M = {rrnj i„) G given v = {vt) G IR® 

we define analogously v = {vi Ih) & 



AT = 



/ Ak(tm^i) 0 


0 








0 Ak{tra,2) • 


0 


and G^ = 


9k{j^m,2) 


V 0 0 . 


• A]^ {tm 


s)j 




\ 9k{^m,s) / 



fc = l,2. 



where Ih is the identity in H . In [I] the following three results are proved: 



( 7 ) 



Theorem 1. Let {Ai(t)}f^i be maximal and coercive operators, then the scheme 
(6) admits unique solution, bounded independently of At, which can be expressed 
as 

Um+i ^ Ri^_MAf^,-AtA^) U'^ + At S{AtAf^, AtA^, AtGf^, AtGtff), (8) 



i=i 



2 = 1 



S{AtA^, AtA^, AtG^, AtG^) = 

2 2 



2=1 j—1 k — 1 2=1 



In (8) we have separated the solution of scheme (6) in two terms: 
the contribution of the solution in the previous instant, G™, operated by 
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R{—AtA^,—AtA'^), and the contribution of the source terms gi{t), that we 
have grouped in S{AtA^, AtA^, AtG^, AtG^). This decomposition permits us 
to deduce immediately that the contractivity of FSRK (i.e., ||C/’"+^ — y™+i|| < 
II [7™ — V”^\\ where C/™ and are solutions obtained from different initial 
conditions f/° and V^) is equivalent to \\R{—AtA^,—AtA^)\\ < 1. A weaker 
stability condition can be introduced asfollows: a method of type Runge Kutta 
is said A-stable if for any two solutions C/"* and R™, obtained from the ini- 
tial conditions C/° and V^, with non homogeneous terms gi{t) and gi{t) + ei{t) 
respectively it holds that 

n 

||C/™_T/-|| <C'(||C/0-T/0||+ niax V||£,(t)||). (10) 

tG 0,T 

2—1 

Using again (8), it is easy to check that 

||R(-Ati™,-Ati™)|| < (11) 

and ||5'(zltA™, AtG™, AtG™|| < G At (||G™|| + ||G™||) are sufficient con- 

ditions to verify the stability property (10) for m < 

Theorem 2. Let (C, (&^)^, (6^)^) be a FSRK method given by (3), whose 

coefficients verify 



\R{Z1,Z2)\ 



2 2 
1 Zi{I + '^A^ Zj)-^e 

i=i i=i 



< 1 , 



\/zi,Z 2 €<S, with Re{zi) < 0, for i = 1,2, 



( 12 ) 



(this property is called A-stability of the FSRK) and let Ai(t) , A 2 (f) be maximal, 
coercive and commuting operators verifying (5). Then there exists a constant (3 
independent of At such that (11) is verified. 

Main idea of proof 

To bound R{—AtAf^,—AtAlf) we decompose this operator in the 
form R{—AtA'f‘,—AtAff) = R{—AtAi{tm),—AtA 2 {tm)) + AtP and we 
use that, under the hypotheses of this Theorem, it is verified that 
||i?(— AtAi(tm), — AtA 2 (tm))|| < 1 (see [7]), and also ||P|| < GM (see [1]) to 
deduce that 



||P(-AtA™,-Ati™)|| < 1-f GMAt< 

with P = C M being M = max { | | KR } 

A .- l ,2 

and G a constant that depend on the size of the coefficients of the FSRK. <(> 
In [8] it is also proved that if the FSRK method is strongly A-stable, i.e., 
if it verifies (12) and there exists c < 1 and K, sufficiently large, such that 
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\R{zi,Z 2 )\ < c, if Re{zi) < 0, for i S {1,2} and \zi\ + \z 2 \ > K, then 
the contractivity result \\R{—AtAi{tm),—^tA 2 {tm))\\ < 1 can be improved 
to \\R{—AtAi{tm),—AtA 2 {tm))\\ < with /3' > 0 and independent of 

At G (0,/lto]. Because of this, in some cases the stability result (11) can also 
be improved to contractivity by using a strongly A-stable FSRK method. To be 
more precise, for At G (0, Atg] and Mi small enough^ negative values of /3 can be 
considered in (11). Combining this contractivity property with the consistency 
property, that we will introduce next, the study of the convergence in infinite 
length intervals can be realized. 

To study the consistency of scheme (6) we define the local error as 

gm+l ^ 

u{tm+i) - R{-AtA^,-AtA^) u{tm)-At S{AtAf, AtA'^,AtG'^, AtG^) 

and we say that a FSRK method is consistent of order p if for sufficiently smooth 
data u{tm), G™ and G™, it is verified that He^+^H < G(Z\fP+^), Vm > 0, 
At — > 0, where G is a constant independent of At. 

Theorem 3. Let us consider a FSRK method satisfying the order conditions 

1 

= H ? ’ 

^'=1 (r - } + 1) + ^ pk 

< k=j 

r 

Vr = 1, . . . ,p,V(pi, ...,pr)G{0,...p- 1}’' verifying 1 < r + ^ pfc < p 

k —1 

, and V (ii, . . . , ir) G (1, 2}’’, 

joined to the reductions (C)^e — fcA*(C)^“^e = 0, i G {1,2}, k = 1,. . . ,ko, 
and let us apply it to a problem of type (1), whose solution verifies the following 
smoothness requirements 

r <c, g { 1 , 2 }, 

I / G {1, . . . ,p- fc + 1}, iG{1,2}, ko<k<p, 

I and Pi + . . . + Pi < p — k — I + 1', 

[ < G, with u'At) = ~{AAt)u{t) - g^{t)). 

Then 

||gm+l|| < C{At)P+^, (13) 

where C is a constant independent of At. 

To realize the study of the convergence of the semidiscrete scheme we define 
the global error associated to the time semidiscretization as = sup (u(tm) — 

U™') and we say that the scheme (6) is convergent of order p, if < G {AtY 

where G is a constant independent of At. It is immediate to check that, if the 
scheme (6) verifies (11) and (13), then it is convergent of order p. 



^ This property is related to a small time variation of the coefficients 
di{xi,X2,t)j Vi{xi,X2,t) and ki{xi,X2,t)! at least from sufficiently large values of t. 
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2.2 Total Discretization 

The totally discrete scheme (2), that we propose, is obtained by discretizing in 
space (6) by using the simple upwind schemes defined on rectangular meshes. 

To study the convergence of scheme (2), we define the global error associated 
to the total discretization (2) in the instant tm as if™ = || [u{tm)]h — UJ[^\\h 
and we say that the discretization is convergent, of order p in time and of order q 
in space, if 

<C{h'i + AtP), (14) 

where C is a constant independent of At and of h. 

To analyze the convergence of the total discretization we separate, in certain 
way, the contribution to the global error if™ of the time and of the space dis- 
cretization stages; the contribution of the space discretization to the global error 
will be studied by means of the term, called local error of the spatial discretiza- 
tion, that we define as e™ = || [w™]/t — ti™|U, where m™ is obtained giving a step 
with the semidiscrete scheme (6) by taking as initial point [7™“^ = u(tm-i), 
and C/™ is obtained giving a step with the totally discrete scheme (2) and taking 
as initial point = [u{tm-i)]h- 

Analogously to the study of the convergence realized in [2] we can prove the 
following 

Theorem 4. Let u{t) he the unique solution of problem (1), with 17 sufficiently 
smooth, in such a way that {Ai{t)u{t)}1^^ and are C^{L2), and let us 

consider a FSRK method and the simple upwind discretizations {Aihft)}^^^ of 
{Ai{t)}f^i in a uniform rectangular mesh 17/j. Then 

eff<CAth, (15) 



where h is the mesh size of f2h ■ 

To obtain the convergence of the totally discrete scheme (2), we bound the 
global error in the form 

Ejf < II [u{t^)]h - nh + II [unn - uHk + \\ur - urwF m 

using the formula (8) for a step in the time integration, it is immediate that 

i7r - t/r = Ri-AtAT, -AtAiT){[uitm-i)]h - 

and applying this equality in (16), we obtain a recurrence law for the global 
errors, that under the necessary hypotheses for fulfilling (13), (15) and 

\\R{-AtATj,,-AtATAh<e^^\ (17) 

with /3 independent of h, permits us to prove (14). 

Observe that the bound (17) is a particular case of (11) if the opera- 
tors Aih{t) , A 2 h{t) preserve the monotonicity and commutativity properties of 

^ |j • ||ji is a suitable norm for the space of discrete functions defined in Oh- 
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the operators Ai{t),A 2 {t). Such properties are easily checkable in some cases, 
like for example, simple Finite Difference schemes on rectangular meshes in prob- 
lems of type (1) whose coefficients CC 2 , t), Vi{xi,X 2 ,t) ki{x\,X 2 ,t) not 

depend on the spatial variable Xj with i yf j. In other cases, for example, for 
arbitrary spatial variations in the coefficients of (1), we do not know theoreti- 
cal results which permit us the obtaining of (11), nevertheless, in the numerical 
experiments realized in some non commuting operator cases, the obtained nu- 
merical solutions present also the same stable behaviour. 



3 Numerical Results 



3.1 A Parabolic Problem with Time Dependent Coefficients 



To integrate the following convection-diffusion problem: 



du j d^u 1 d^u 

at 



+ "Cl ^ -|- + (fci + k2)u 



■u(0, y, t) = m( 1, y, t) = u{x, 0, t) = u{x, 1, t) = 0, 
u{x, y, 0) = X® (1 - x)® y^ (1 - yf, x,y e [0, 1], 
g{x, y, t) = e“‘ sin(x(l - x)) sin(y(l - y)), 



= 9{x,y,t), 
X e [0, 1], 



{x,y,t) € n X [0,5], 
t/€[0,l], te[0,5]. 



with di = ^2 = (1 + e *), = (U-x)(2-|-e *), z ;2 = (1 J- y)(2 — e *), fci = U-x^ 

and k 2 = 1 + sin(Try), we combine the third order FSRK method given by (18)^. 



® The details of the construction of this method can be seen in [1]; there it is proven 
that at least five stages are necessary to obtain third order and six stages are con- 
venient. Note also that each RK that compose the FSRK can be reduced to a third 
order SDIRK method with three stages (see also [6]) 
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/ 0.435866521508459 \ 

0.435866521508459 0 
0.264133478491540 0 0.435866521508459 
0.524203567293128 0 -0.224203567293127 0 

0.054134244066592 0 0.0741327129164892 0 0.435866521508459 
2.005981609913539 0 1.336337252930893 0 -2.59231886284469 0 
y 2.838287230686191 0 2.207497360663944 0 -4.04578459135012 0 J 

/o \ 

0 0.435866521508459 
0 0.170931386851894 0 
0 -0.13586652150846 0 0.435866521508459 
0 0.062944816984284 0 -0.09326511998115 0 

0 -0.543014480247272 0 0.8571479587388134 0 0.435866521508459 
yO -0.781001854745764 0 1.100752954088072 0 0.680248900657693 J 



(18) 



Ce = (0.435866521508459,0.435866521508459,0.7,0.3,0.56413347849154,0.75)^ . 

Using this scheme we have computed the numerical errors 

^N,At — max \U ’ U ’ ^ (^Xi, yj 

(xi,yj)GO 1 

tm —mAt^ m=l,2,..., ^ 

where U^’^{xi,yj,tm) is the numerical solution obtained in the spatial point 
(xi,yj) and in the time point tm, using a uniform rectangular mesh (with 
N X N points ) and constant time step K. To obtain these numerical errors we 
have taken At = in order to the contributions, in the global error, of time 
and space discretization stages are of the same order. 



Table 1. errors {Em^ai) 



N 


8 


16 


32 


64 


128 


256 


512 


EN,At 


4.0206E- 4^ 


.3801E -41 


.2921E - 41 


..7694E - 51 


A91E - 5) 


.782E - 


.992E - 6 



3.2 A Singular Perturbation Case 



In the following singular perturbation problem 



du ^ d_ 

at “1 dx 



— do 



dy- 



+ Wi If + «2 fr + (fci + fc 2 )w = g{x, y, t), 



\ u{0,y,t) =u{l,y,t) = 0, ye [0,1], te[0, 5], 

< u{x,0,t) = u{x,l,t) = 0, a:e[0, 1], te[0,5], 
■u{x,y,0) = h{x)h{y), x,y € f2, 

, gix, y, t) = e~*h{x) + e~*h{y), {x, y,t) £ ft x [0, 5], 



{x,y,t) £ fi X [0,5], 
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with di = e(2 — e *)(1 + xy), d.2 = e(2 — e *)(2 — y), = (2 + sin(Trt) e *)(1 + 

ti 2 = (2 — sin(7Tt) e“‘)(2 + sin(Try)), fei = 1 + fc 2 = 1 + sin(7r?/) and 
^(C) = (e ^ — e“e — (1 — e~ )C) we have used the FSRK method given by 
(18) for the time integration. For the spatial discretization we have used the 
simple upwind scheme and the rectangular Shishkin meshes given in [2]. 

In table 2 we show, for each e, the numerical errors 

Ee,N,At= max 

{xi,yj)GO 1 

obtained taking At = We have evaluated these errors from t = 0.1 until 
T = 5 since an order reduction occurs only in the first step because of data in 
t = 0 do not verify (13) for p = 3. 

Remark 1. When the meshes are non uniform, we have used bilinear interpo- 
lation in the spatial variables x and y to evaluate {xi^yj^tm) in the 

points (xi,yj) of the mesh Sl^. Note that the order is less than one (con- 
cretely log{N) + At^) due to the singular perturbation nature of the prob- 
lem. More numerical tests with arbitrary time and space dependence on the 
coefficients can be seen in [1]. 



Table 2. errors (E^^N.At) 



e 


N = 8 


iV = 16 


CO 

to 


N = 64 


N = 128 


N = 256 


1 


2.7102F1 - 4 


1.5276F1 - 4 


8.1128F; - 5 


4.2080F1 - 5 


2.1447F1 - 5 


1.0873F1 - 5 


10-1 


7.9712F; - 3 


1.1429F1 - 2 


7.6423F - 3 


4.5348F1 - 3 


2.5052F; - 3 


1.3207F; - 3 


10"^ 


2.2611F; - 2 


1.4857F; - 2 


1.0849F - 2 


7.8887E - 3 


5.4609F; - 3 


3.4323E - 3 


10“® 


2.2424F; - 2 


1.6301F - 2 


1.1793F - 2 


8.4844F; - 3 


5.8843F; - 3 


3.7733E - 3 


10-1 


2.2604F1 - 2 


1.6466F; - 2 


1.1915F; - 2 


8.5733E - 3 


5.9517F; - 3 


3.8365S - 3 


10“® 


2.2621F; - 2 


1.6482F; - 2 


1.1928F; - 2 


8.5831F; - 3 


5.9596E - 3 


3.8443F; - 3 


10“® 


2.2623E - 2 


1.6484F; - 2 


1.1930F; - 2 


8.5841F; - 3 


5.9603F; - 3 


3.8451F; - 3 


10-" 


2.2623E - 2 


1.6484F; - 2 


1.1930F - 2 


8.5843F; - 3 


5.9604F; - 3 


3.8444F1 - 3 


10"® 


2.2623E - 2 


1.6484F - 2 


1.1930F - 2 


8.5843F1 - 3 


5.9604F; - 3 


3.8444F; - 3 


10 "® 


2.2623E - 2 


1.6484F; - 2 


1.1930F - 2 


8.5843F; - 3 


5.9604F; - 3 


3.8444F1 - 3 


TTimax 

^N.At 


2.2623E - 2 


1.6484F; - 2 


1.1930F; - 2 


8.5843F; - 3 


5.9604F; - 3 


3.8451F; - 3 
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Abstract. It is well known the great deal of advantages of integrating 
reversible systems with symmetric methods. The correct qualitative be- 
haviour is imitated, which leads also to quantitative advantageous prop- 
erties with respect to the errors and their growth with time. More par- 
ticularly, fixed stepsize symmetric linear multistep methods especially 
designed for second order differential equations can integrate very effi- 
ciently periodic or quasiperiodic orbits till long times. A study will be 
given on what happens when variable stepsizes are considered so as to 
deal with highly eccentric orbits. 



1 Introduction 

In the last years, an effort has been made to construct and analyse methods 
especially designed to integrate eccentric orbits with variable stepsizes. In par- 
ticular, symmetric variable-stepsize one-step methods have been shown to lead 
to slow growth of error with time when integrating periodic orbits of reversible 
systems [3], and some efficient numerical techniques have been designed [8] [9] 
which take profit of this advantegeous property when the system to integrate 
is also Hamiltonian. See also [1] for numerical comparisons among the different 
integrators and more particular analysis of error growth with time for them. 

On the other hand, some symmetric fixed-stepsize linear multistep methods 
for second order differential equations of the type (2) have also been proved to 
lead to slow error growth with time for periodic orbits of reversible systems. (The 
key property these methods must satisfy for this is that the first characteristic 
polynomial has only 1 as a double root and all the others are single [4]). High- 
order explicit methods of this type have been suggested in [10] and they are 
very efficient when integrating not too much eccentric orbits as they just need 
one function evaluation per step in contrast with many more needed by one- 
step methods of the same order. It was also proved in [4] that symmetric linear 
multistep methods for first order differential equations lead to unstable numerical 
solutions, except some very particular cases described in [6]. That’s why we will 
concentrate on this paper in symmetric linear multistep methods for second order 
differential equations, which we will also denote by symmetric LMM2’s. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 144—152, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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The aim of this paper is to begin a study on the construction and numerical 
behaviour of symmetric LMM2’s when variable stepsizes are considered in order 
to deal with highly eccentric orbits. The techniques used in [8] to generalize 
numerical integrators to variable stepsizes by considering a suitable change of 
variables in time and modifications of Hamiltonians are not applyable here as 
the resulting initial value problem would not be of the form (2) and therefore, 
we would not be able to integrate it with a LMM2. Therefore, in the following 
paper, we consider a natural generalization to variable stepsizes of LMM2’s, 
which is described in Section 2. Also in this section, necessary and sufficient 
conditions on the coefficients of the methods are given for symmetry. In Section 3, 
an explicit second-order variable-stepsize symmetric LMM2 is constructed. It 
does not mean to be an optimal method, but a first example of a procedure 
of constructing integrators of this type. In order to prove convergence for this 
particular integrator, consistency and stability is required. The former is given 
by the way the method is constructed. The latter is proved in Section 4 under 
mild conditions. Finally, in Section 5, some numerical results are described in 
order to see whether the advantageous error growth also applies for this method 
when variable stepsizes are considered. 

2 Symmetry Conditions for Variable-Stepsize LMM2’s 

A fixed-stepsize linear fcth-step method for second order differential equations is 
defined by a difference equation like this 

akyn+k-\ aoyn = h‘^[l3kfn+k-\ Pofn], n>0, ( 1 ) 

and k starting values yo,yi, ■ ■ ■ ,yk-i- In formula (1), h is the stepsize of the 
method, {ym}mG{o,i, 2 ,...} are the numerical approximations to the solution of an 
initial value problem of the form 

y{i) = 

y{to) = uo, (2) 

y{to) = •Uo, 

in times {tm = to + m-/i}mG{o,i. 2 ....}, {/m} denotes {/(j/m)} for the function / in 
(2), and {adzLo {A}f=o constant coefficients (not depending on the 

problem (2) considered neither on the stepsize h) which determine the method. 

A natural generalization of methods of this type to variable stepsizes is to 
consider the following change in (1): 

^kihn-i ■ • ■ ; ^n+/c— 1 )yn-t-fc “t” * * * “t” ‘ ‘ ‘ ; ^n-t-fc— l)yn 

= ft-ra+fc— 1 \(^k(hm ■ ■ ■ ; kn+k—l)fn+k “b ’ ’ ’ “b ' ' ' i ^n+fc— l)/n]- (3) 

Now {ym}mG{o,i,2,...} denotes the approximations to the solutions of (2) in times 
{tm = to + ho + ■■■ + hm-i}mG{o.i, 2 ,...}- So, obviously, hm is the stepsize con- 
sidered to go from ym to ym+i, and the coefficients of the method can’t now be 
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constants but functions which depend on the stepsizes given by the method in 
each particular case. 

For fixed-stepsize methods, it is well known that the following conditions are 
sufficient and necessary to get a symmetric stable LMM2 [11] 

O^j = CXk—j^ j — 0, . . . , /c. 

For their variable-stepsize counterparts, the following are sufficient conditions 
for symmetry 

aj{ho, . . . , hk-i) = ak-j{hk-i, ■ ■ ■ , ^o) 

■ 5 hk—l) — "72 — ; ^o); J ~ O 5 ■ • ■ ; 

This is easily verified. If the method takes yn, . . . , yn+k-i to y„+k by (3), then, 
by using (4), the following difference equation is satisfied 

O^o(^n+fc— 15 ■ • ■ ; hn')yn+k “t“ * * * “t“ OCk{hn+k—l ^ i hn)yn 
~ [/^o(^n+fc— 1 ; ‘ ‘ ‘ ; ^n)/n+fc “t“ * * * “t“ (^n+/c— 1 ; ‘ ‘ ‘ 5 ^n)/n] ; ( 5 ) 

which says that the method would take yn+k, • ■ • , Vn+i to yn if the same stepsizes 
in the reversed order had been considered. To assure that the method would have 
taken the same stepsizes when integrating backwards, it is also necessary to ask 
that the stepsize going forward from a numerical approximation to yn+i is 
the same as the one going backwards from j/n+i to y„. (The same happens for 
one-step methods [3]). In other words, if the stepsize taken is just a function of 
the point of departure and the point of arrival if(y„,y„+i), it must be verified 
that 

H{ynjyn+l) — H(^yn+1, yn) ■ 

This is verified, for example, if the following arithmetic media is considered 

hn= ^[s{yn) + s{yn+l)], (6) 

for any function s. (This function will be chosen so that the stepsize hn is suitable 
for the integration of the problem.) 

3 Construction of an Explicit Second-Order 
Variable-Stepsize Symmetric LMM2 

Looking for a second-order explicit method which verifies the conditions in Sec- 
tion 1, we have seen that k = 2 leads to fixed-stepsize method and fc = 3 to 
nearly the same conclusion, as only two different stepsizes would be possible 
in this case. Therefore, we have to look in 4th-step methods to get a variable 
stepsize-adaptive method. 

In such a case, we have eight unknowns ao, . . . ,ai, /3i, P 2 , Ps for determined 
stepsizes and the conditions of order would be 4. However, as the symmetry 



Variable Stepsizes in Symmetric Linear Multistep Methods 147 



conditions are difficult to treat by themselves because it is not established the 
kind of dependence on the stepsizes, we have forced symmetry by writing the 
order conditions in forward and backward forms and assuming then the condi- 
tions (4). More explicitly, we have made formula (3) exact for the polynomials 
y{t) = (m = 0, 1, 2, 3), when going from ti) = 0 to = ho + hi + h 2 + h^, 
and when going from to to- In such a way, a linear system of seven equations 
and eight unknowns turn up. By using MATHEMATICAL'^, we have solved 
it and found that the system says that 



04 = — Oi 



ao = —as 



-02-: 



ho -|- hi 



ho hi h2 hs ho hi h2 hs 

hs /i2 -I- hs 

— Q2 

ho hi h2 hs ho hi h2 hs 



— as 



- ai 



ho hi /i2 

ho hi h2 hs 

hi h2 hs 

ho hi h2 hs 



( 7 ) 



Pi 

02 




asgz{ho, hi, h 2 , hs) — 0252 (^ 0 : hi, / 12 , hs) — aigi{ho, hi, / 12 , hs) 



- 03 - 



hi 



- asPsiho, hi,h2, hs) - 02P2(ho, hi, /12, hs) 



-aipi{ho, hi, h2, hs). 



(8) 



where all the coefficients correspond to the method going from t = 0 to t = 
ho + hi + h 2 + hs in this order, and where {5i}i=i,2.3 and {pi}i=i, 2,3 are rational 
functions of the stepsizes. 

By taking 02 = 2, oi = 03 = —2, the first characteristic polynomial for fixed 
stepsize would be 

p{x) = x'^ — 2x^ + 2x^ — 2x -I- 1. 

This polynomial has single roots ±* and double root 1. Therefore, every LMM2 
which has this as its first characteristic polynomial and which is implemented 
with fixed stepsize would lead to linear error growth with time for a great deal 
of problems including Kepler’s, in contrast with more general methods which 
would lead to quadratic error growth [4] . 

Substituting these values of {ai}^^i in the formulas (8) for {/3i}?^i, we have 
found that a possible symmetric choice for {0i}i^i is 



01 



02 



— 2/13/12 — 2 h\ho — 4/13/1I + 2/14/12^3 — 2/10/12/13 + 2/ii/i| 
6/ii/i| 



^{02) 

6/11/12/13 ’ 



where 



n{.02) = 2/iq/ii -I- 2/iq/ii/i3 -I- 4/iq/ii + 4 / 10 / 14/12 + 2hoh\hs — 2h\h2 -b 6 /iq/ii/i2^3 

— 6/I4/I2 ~b 2 /loh. 2^3 4/14/12^.3 “b 4 :h^hs ~b 2 /io/^ 2^3 ~b 2 /l 2/^3 — 2 hih^, 

— 2/lp/l4 — 2 /Iq/i 3 — 4/10/I4 -b 2/12/14/10 — 2/10/14/13 -b 2/12/I4 

^ 6/12/10 



We have found these coefficients by assuming that 6 /i 2 /iq /33 has a homogeneous 
third-degree polynomial expression on the stepsizes and selecting a choice from 
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all the possible ones which make (8) possible apart from the symmetry condition 
between and /?3 (4). 



4 Analysis of Stability of the Previous Method 



Stability of variable-stepsize LMM is not such an easy issue as stability of 
variable-stepsize one-step methods. There are some results, however, correspond- 
ing to some particular cases such as Adams or Stormer methods [5] [2]. There 
are no results indeed for our particular method. Therefore, we proceed to study 
its stability. 

The first step is to express the method as a one-step integrator by considering 
a wider phase-space. We introduce the variables {r'ri}ns{o,i,...} defined by 

yn+l = yn + hnVn- (9) 



In such a way, it can be seen that the following difference equation is satisfied 
by 



hn+l + hn+3 hn+2 
^"+3 i Z T l’n+2 



^n+1 



-|- h 



hn H“ ^n+2 ^n+3 ^n+3 

hn + hn+l + hn+2 + ^n+3 



'^n+l 



^n+1 H" ^n+3 



n+3 ■ 



^n+ 2 ) 



hn hn -\-2 ^n +3 

(/3l/n+l + /32/n+2 + Psfn+s)- (10) 



In fact, (9) and (10) are the equations used to implement the method in practice, 
as this process (also called stabilization [7]) leads to much lower errors when the 
stepsize is small (roundoff errors are diminished significantly). 

Considering then the augmented vector = [yn+ 3 ,Vn+ 2 ,Vn+i,Vn]'^ , the 
following recurrence relation is satisfied 



hn+l — -^n^n ^n-t-3 An (In ) , 



( 11 ) 



where 



/ , hn+l + hn+3 , , hn+l + hn+3 , \ 

t — ; — ; ft-n+2 —iT-n.+ l — ^ ; fin 



An — 



7 I U ■■'71^^ 7 7 

n-n -r nn+2 '^n + "■ra+2 

hn+l T hn+3 hn+2 hn+l hn+l T hn+3 hn 



0 

0 

Vo 



hn “t” hn +2 hn +3 hn +3 hn hn +2 hn +3 



Fn{Yn) = 



( hn+3{l3lfn+l+ !32fn+2 + (i3fn+3) 

hn hn+l hn+2 F hn+3 , ^ p .op \ a f \ 

— \PlJn+l + P2jn+2 + P3jn+3) 



\ 



‘^{hn “t” hn+2^ 



>1. 



V 
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By considering succesive powers of the matrices Ai • • • Aj+i), it can 

be proved that the infinity norm of this power is bounded independently of the 
number of factors n — j, whenever every stepsize hi satisfies 

hi>chi- 2 m, m = 1,2,..., hi<h, Z = l,2,... 

for some constants c and h. (These conditions are easily satisfied for c small 
and h large enough.) 

Stability is given if, whenever (11) is slightly modified as well as the initial 
value lo, the vectors {V™} obtained are also only modified accordingly. This hap- 
pens due to the mentioned previous bound for the powers oi Ai. A modification 
of (11) would be 

hn+i = A^y^ji -\- zzyi). 

So, if En denotes we have that 

En+i — A^iEji ~\~ Fn') “t“ 



and therefore 

n—1 

En-\-i — Aji . . . AqEq ^ ( Aji * * ■ Aj_|_r [(^(^' -^i) T ^j) A 

3=0 

Now if Fn is Lipschitz with respect to (as it happens in fact if / is and < 
Chn-i for some constant C), the following bound is verified 

) n—l 

+ KL'^hj+3\\Ej\\^, 

3=0 

where K is the bound for the powers A„ • • • Aj+i, Tf is the final time till we 
integrate and L is the Lipschitz constant for En. A discrete Gronwall argument 
then says that 




ll^^n+illoo ^||Ao||oo + (T/ - to)max||z/j||oo^ , 

which proves stability under the given assumptions. 



5 Numerical Experiments and Error Growth with Time 

We have implemented the method in Section 3 using stabilization [7] and com- 
pensated summation, in order to reduce roundoff error as much as possible. We 
have integrated Kepler’s problem 



{xj + xj) ’ 



i = l,2, 



Xi = 



150 



B. Cano 



with initial positions and velocities given by xi(0) = 1— ecc, X 2 ( 0 ) = 0, xi(0) = 0, 
^ 2 ( 0 ) = \J i-ecc • ^ solution describes an orbit of a satellite 

around a planet in the form of an ellipse of eccentricity ecc. Here we have taken 
ecc = 0.9, as these variable-stepsize methods are constructed so as to integrate 
problems where a great variability in the solution makes it suitable to treat with 
more care some parts than others. 

The stepsize function has been chosen according to (6), solving iteratively 
this equation to a given relative tolerance 10“^. The function s has been chosen 



This choice was suggested in [3] . (It means the time of a free fall into the centre 
from the current configuration.) It leads to smaller stepsizes in the pericentre 
and larger in the apocentre, as it is reasonable because of the velocity of the 
satellite in each case. Some other possible choices of s led to bigger errors, so we 
decided to take this for our experiments. 

As it was proved in [4], the method considered in this paper, when imple- 
mented with fixed stepsizes lead to linear error growth with time for this problem. 
The main objective of this paper is to see whether the same phenomena happens 
for variable stepsizes. Our numerical experiments allow us to say that they do. 
Figure 1 shows how error grows with time when we measure the error at final 
times lOT, SOT, . . . , 21870T, T = 2tt being the period of the problem. Each line 
corresponds to a different tolerance e in (6). More explicitely, e = 27 t 10“® and 
e = 7t10“®. You can see that, when the final time is multiplied by 3, the er- 
rors are multiplied by the same number. Order 2 is also manifest in this Figure, 
as you can see that the errors corresponding to a same final time divide by 4 
approximately when the stepsize is halved. 

6 Conclusions 

For the variable-stepsize symmetric LMM2 constructed, advantageous error 
growth is observed in the same way as its fixed-stepsize counterpart. A ques- 
tion arises on whether this happens for every generalization of the same kind. 

The great advantage this type of methods can have over Runge-Kutta ones 
is that just one function evaluation per step is needed when they are explicit. 
Although a symmetric selection of the stepsizes makes the variable-stepsize mode 
implicit, no more function evaluations are needed for LMM’s while they are 
needed for explicit Runge-Kutta methods because the stepsize is required in the 
evaluation of the stages. This fact can make variable-stepsize symmetric LMM2’s 
interesting when problems of very costly function evaluations are considered. In 
this case, this part of the computation could be much more expensive than the 
calculus of the coefficients of the LMM. 
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Fig. 1. Error growth with time for Kepler’s problem with eccentricity 0.9 
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Abstract. In this note we propose a multigrid approach to the solution 
of (multilevel) banded circulant linear system. In particular, we discuss 
how to define a “coarse-grid projector” such that the projected matrix 
at lower dimension preserves the circulant structure. This approach nat- 
urally leads to an optimal algorithm having linear cost as the size N 
of the system and so improving the the classical one based on Fast 
Fourier Transforms (FFTs) that costs 0{N log N) arithmetic operations 
(ops). It’s worth mentioning that these banded circulants are used as 
preconditioners for elliptic and parabolic PDFs (with Dirichlet or peri- 
odic boundary conditions) and for some 2D image restoration problems 
where the point spread function (PSF) is numerically banded. Therefore 
the use of the proposed multigrid technique reduces the overall cost from 
0{k{e,n)N log N) to 0{k{e,n)N), where k{e,n) is the number of Pre- 
conditioned Conjugate Gradient (PCG) iterations to reach the solution 
within a given accuracy of e. The full analysis of convergence and the 
related numerical experiments are reported in a forthcoming paper [18]. 

Keywords: Circulant matrices, two-grid and multigrid iterations. 

AMS(MOS) Subject Classification: 65F10, 65F15. 



1 Prelude 

Let / be a d-variate trigonometric polynomial defined over the hypercube 
with Q = (0, 27t) and d > 1 and having degree c = (ci, C 2 , . . . , c^), Cj > 0 with 
regard to the variables s = (si, S 2 , • ■ • , Sd)- From the Fourier coefficients of / 

ds, = j = (ji,...,jd) € (1) 

( 27 '") jQd 

with (j, s) = Yl^^jkSk, n = {ni,...,Tid) and N{n) = one can 

build the sequence of Toeplitz matrices {T/(/)}, where Tn{f) ={aj_i}"j^gT 
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g (jN(n)xN{n) ^ 6 = (1, . . . , 1)^ € N'^ is Said to be the Toeplitz matrix of order n 
generated by / (see [20]). 

It is clear that aj = 0 if there exists an index i such that the absolute value 
of ji exceeds Ci (i.e. if the condition \j\ < c is violated). 

Accordingly, the d-level circulant matrix of order N{n) generated by the same 
polynomial / (see e.g. [20]) is defined as 

^n(/)=E«^^n= E E (2) 

bl<c bl|<ci \jd\<Cd 

where the matrix is the cyclic permutation Toeplitz matrix that generates 
the unilevel circulants, i.e. {Zm)s,t = {t ~ s) mod m. If denotes the m-th 
unilevel Fourier matrix whose {s,t) entry is given by the quantity (e‘^^ 
then it is well known that 5'„(/) = where F„ = Fn^ ® ® is 

the d-level Fourier matrix and = DiagQ<j<„_g/(27rj'/n). Here the relation 
0 ^ j ^ n — and the expression 27rj'/n = 27r(ji/ni, . . . ,jd/nd) are intended 
componentwisely. 

Under the assumption that Ci < [(rii — 1)/2J, the matrix 5'„(/) is the 
Strang or natural circulant preconditioner of the corresponding Toeplitz ma- 
trix Tn{f) [5]. We observe that the above mentioned assumption is fulfilled at 
least definitely since each Ci is a fixed constant and rii is the size at level i, which 
is natural to think large when considering a discretization process. 

Now let Cn{f) be the n-th Cesaro sum of / given by 



Cnif) 




t=(o.....o) 



mi 

N{n) 



with {Fj{f)){s) = being the j-th Fourier expansion of /. Then 

degree(c„(/)) = degree(/) and Cn{f) = *S'„(c„(/)) is the T. Chan [6] optimal 
preconditioner of Tn{f). 

Besides Toeplitz linear systems (see for instance [4]), these banded cir- 
culant preconditioners have been used in the context of preconditioning 
of discretizations by Finite Differences of PDFs over hyperrectangular do- 
mains [2,11,12,14,15]. In this case, by the consistency condition, it is known 
that / is a nonnegative trigonometric polynomial which vanishes at x = 0 and 
can be chosen positive elsewhere (see e.g. [16]). Therefore 5'„(/) is singular, so 
that it is usually replaced by 

S4/) = S4/)+(,™ld(^))i^ (3) 

which is positive definite and can be used as preconditioner. 

On the other hand, C'„(/) is always positive definite since c„(/), with / being 
a nonnegative polynomial, can vanish if and only if / is identically zero (see [17]). 

However, the clustering properties related to the modified Strang precon- 
ditioner are better than those of the optimal one in the case of nonnegative 
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generating functions with zeros (for a rigorous analysis of this phenomenon refer 
to [19,7]). 

Now if we consider PCG-like methods for the solution of a linear system 
A^yi = b, then the cost per iteration is given by 

a. solution of P„y = c with the preconditioner P„, 

b. a constant number of matrix-vector products with matrix An, 

c. computations of lower order complexity (vector- vector operations etc.). 

In the case where An = Tn{f), the overall cost of b. and c. is of 0{N{n)) 
arithmetic operations (ops) due to the bandedness of Tn{f) while the cost of a. 
is of 0{N{n) log N{n)) ops due the use of FFTs. 

The method of multigrid type that we propose in this paper reduces the cost 
of a. to 0{N{n)) ops when P„ is circulant and banded (for a proof of this claim 
see [18]). 

Indeed the technique can be also extended [18] to the case where P„ = 
Pn + Pn being circulant and banded, being a 

special p rank corrections with fq denoting the g-th Fourier column of P„. Of 
course, this extension is of interest since it allows to treat the case of the modified 
Strang preconditioner given in (3). 

The paper is organized as follows. In Section 2 we recall definitions and basic 
results concerning two-grid and multigrid iterations. Then, in Section 3, we define 
our multigrid technique for unilevel and multilevel circulants and we analyze in 
detail the properties of the projected “coarse-grid operators” . A short Section 4 
of conclusions ends the paper. 

2 Premises 

Consider the iterative method 

J-O'-I-I) ^ VnX^^'> + bi := Vn(x^^\bi) (4) 

for the solution of the linear system = b where An, Mn, Vn := In — M~^An G 
C^xn, g Given a full-rank matrix S with k < n, 

a Two-Grid Method (TGM) is defined by the following algorithm [10] 

TGM{Vn,pt^){x^^^) 



1 . dn = AnX^P - b 

2 . dk = {Pn)^dn 

3 . Ak = {pt)^Anpt 

4. Solve AkV = dk 

5. = X^^^> — PnV 

6. = V)((x(j\6i) 

Step 6. consists in applying the “smoothing iteration” (4) v times, while 
steps 1. — > 5. define the “coarse grid correction”, that depends on the projection 



Preliminary Remarks on Multigrid Methods for Circulant Matrices 



155 



operator p^. The global iteration matrix of TGM := TGM^ is then given by 



TGM{Vn,ptv) = V: 



In 



pi{{p^^rAr.pi)-\pi)^A. 



In [8,9], by using specific analytical properties of the generating function /, 
we defined a fast TGM for Toeplitz and r problems (the r class is the algebra 
associated to the most known sine transform [1] and is generated by the Toeplitz 
matrix T„(cos(s))). Here we propose the multigrid idea for multilevel banded 
circulant matrices and, in particular, we define the operator and we analyze 
the projected matrix Ak = {p^)^ Anp^. 



3 Multigrid Method for Unilevel Circulant Matrices 

For An = Snif), with / being univariate trigonometric polynomial with a unique 
zero G (0,27t], we consider the smoothing iteration (4), where the matrix Vn 
is defined by = /„ - H„/||/||oo so that U = 5„(1 - //||/||oo)- Then (4) takes 
the form of the relaxed method = x^A _ [||/||oo]“^ (^A„x^A _ c). 

In order to provide a general method to obtain projectors from an arbitrary 
banded circulant matrix P„, for some bandwidth d independent of n, we define 
the operator Tn € n = 2k, such that 

. . _ / 1 for * = 2j - 1, j = 1, . . . , fc, , , 

I n)i,j Q otherwise. ^ 

Given any matrix we obtain a projector pjj S R" as pi = Pn- T^- 

For Pn too, we define the eigenvalue function p{x), which sets the weights 
of the frequencies in the projector; in other words, the spectral behaviour of Pn 
selects the subspace in which the original problem is projected and solved. In 
this way we set P„ = S'„(p). 

If is a zero of /, then set x = (tt + x°) mod 27 t and take the trigonometric 
polynomial 

p{x) = (2 — 2 cos(x — i:)) ~ |x — over (0, 27r] (6) 



where 



(7) 

0 < p^(x) +p^(7T + x). (8) 

If / has more than one zero in (0, 27r], then the corresponding polynomial p will 
be the product of the basic polynomials satisfying (6), (7) and (8) for any single 
zero. Of crucial importance is the following set of simple observations. 

Remark 31 Relations (7) and (8) impose some restrictions on the zeros of f. 
First, the zeros of f should be of finite order (by (7)). Secondly, if x^ is a zero 



(3 = argmin < lim 

I X — 



{x — 

fix) 



0\2i 



< +00 
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of f , then + > 0; otherwise relationship (8) eannot he satisfied with any 

polynomial p. However the seeond restrietion depends on the faet that we half 
the dimension so that if f has some zeros in (0,27t] located with period tt, then 
we have to change the ‘form” of the projection that is its smaller dimension. 
Compare for instance [8] and [3] concerning the case of symmetric Toeplitz 
structures: indeed in [3] for the generating function f{x) = x^(7t^ — x^), the 
authors consider a “block form” of the projector proposed in [8]. This new choice 
works much finer and overcomes a problem due to the position of the zeros of 
f{x) = a;^(7r^ — Finally we recall that a more general solution to the problem 
of the position of the zeros can be found in [13] where the author proposes to 
change the proportionality factor between the matrix sizes of the “finer” and of 
the “coarser” levels. 

Remark 32 //degree(/) = c, i.e., f{z) = f^^ some coefficients Oj, 

then f can have a zero of order at most 2c, so that < c and therefore 
degree(p^) < 2 |"c/2]. 

3.1 Properties of 

First of all, let us consider a spectral decomposition of T^. In analogy with the 
r case proposed and analyzed in [8,9], the operator represents a spectral link 
between the space of the frequencies of size n and the corresponding space of 
frequencies of size k. Indeed, by observing that 

[T„^]^/N = ^/W, /r = 0,...,fc-I 

and 

= /r' = fc,...,n-l, yi' = yL+k, 

it directly follows that 

K]^K = ^[l,l]0Pfe (9) 

where Fm is the unilevel Fourier matrix of size m and is the operator defined 
in (5). 

In order to apply a full multigrid method, it is important to preserve the 
“structure” at the lower levels. Therefore, if we apply the MGM to An := Sn{f) 
with / nonnegative, we require that the matrix at the lower level belongs to the 
circulants of different size k with nonnegative eigenvalue function. These and 
other properties are established in the following proposition. We remark that 
very similar statements have been proved [13,8,9] for other structures (matrices 
discretizing elliptic PDFs, r matrices etc.). 

Proposition 1. Let n = 2k, Pn = Sn{p)Tn, and let f be nonnegative. 
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1. The matrix 2{p^)^ Sn{f)Pn coincides with Sk{f) with /(x) = /(x/2) p^{xj2) 
+/(7T + x/2)p^(7t + x/2) for x S (0,27t]. If f is a polynomial then f is a 
polynomial having at most the same degree as f. 

2. If x'^ is a zero of f then f has a corresponding zero y^ where = 
2x° mod 27 t. 

3. The order of the zero of f is exactly the same as the one of the zero x° 
of f , so that at the lower level the new projector is easily defined in the same 
way. 

Proof. 

The projected matrix (p^)^ Sn{f)Pn can be spectrally decomposed by taking 
into account relation (9). Indeed we have 

{pt)^SM)pt = [T^f Sn{p)SM)Snip)T^ 

= [T::f 

= if. {P-f, + F« 

where 

Diag ((pV)(4”')) 

and 

. ((pVKxW)) . 

Since x^"^ = x^^^ f2 for j = 0, . . . , fc — 1 and x^?^ = j2 + tt for j = 0, . . . , fc — 1 
with j' = j + fc, it follows that the matrix 2{p^)^ Sn{f)Pn can be seen as Sk{f) 
where /(x) = /(x/2) p^(x/2)+/(7r + x/2)p^(7r + x/2) for x G (0,27 t]. 

From the expression of f and since p{tt + x°) = 0 by (6), it directly follows 
that = 2x° mod 27 t is a zero of / (i.e. item 2. is proved). 

Moreover, by (8), we deduce that p^(x°) > 0 since p^(x° + tt) = 0 and the 
order of the zero of (p^/)(x/2) is the same as the order of /(x) at x°. But 
by (7) we can see that p^{xj2 + tt) has at ?/° a zero of order at least equal to 
the one of /(x) at x'^. Since both the contributions in fix') are nonnegative the 
thesis of item 3. follows. 

Finally we have to demonstrate the last part of item 1. Suppose that / 
is a nonnegative trigonometric polynomial (and then real-valued) of degree c. 
Consequently, by looking at / and p^ as Laurent polynomials on the unit circle, 
we have fiz) = p^iz) = ^ aj = a^j, bj = 

b-j and x G (0,27 t]. By a straightforward calculation we deduce the following 
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representations 



(pV)(x/2) = 
(pV)(a;/2 + 7t) = 



c-\-l 

j=-{c+l) 

c-\-l 

3 = -(c+l) 



with Qj = g-j so that / is a polynomial since 

L(c+0/2j 

fix) = ‘^92jZ^- 

j=-[{c+l)/2\ 

Now by Remark 32 we recall that I is at most equal to 2 [c/2] and consequently 
L(c+/)/2j < L(c+2[c/2])/2j =c 

so that the second part of item 1. is proved. □ 

4 Concluding Remarks 

In the multilevel case the projector is simply defined as KnPn where Pn is 
a multilevel circulant matrix generated by a d-variate polynomial p satisfying 
a d-variate version of conditions (6), (7) and (8), n = (ni, . . . ,nd) and = 
Tni 0 • • • 0 Tn^. Under these assumptions a d- variate rewriting of Proposition 1 
holds true (see [18] for further details). 

Concerning the cost per iteration, we observe that steps 1.— 3., 5. and 6. 
in the procedure TGM costs 0{N{n)) ops due to the bandedness of the in- 
volved multilevel matrices. Then the cost of TGM at dimension n is c(n) with 
c(n) < c{n / 2) + qN {n) , with q > 0 constant independent of n. The above relation 
trivially implies c(n) < 2qN{n) and then the linear cost of the proposed tech- 
nique, since the convergence rate is independent of the multiindex n as reported 
in [18]. Finally we point out that Proposition 1 and its multilevel generalization 
are crucial in order to define a full multigrid method since they allow a recursive 
application at the lower levels of the procedure TGM. 



References 

1. D. Bini and M. Capovani, “Spectral and computational properties of band sym- 
metric Toeplitz matrices”, Linear Algebra AppL, 52/53 (1983), pp. 99-125. 155 

2. R. H. Chan and T. F. Chan, “Circulant preconditioners for elliptic problems”, J. 
Numer. Linear Algebra AppL, 1 (1992), pp. 77-101. 153 

3. R. H. Chan, Q. Chang and H. Sun, “Multigrid method for ill-conditioned symmetric 
Toeplitz systems”, SIAM J. Sci. Comp., 19-2 (1998), pp. 516-529. 156 



Preliminary Remarks on Multigrid Methods for Circulant Matrices 



159 



4. R. H. Chan and M. Ng, “Conjugate gradient methods for Toeplitz systems”, SIAM 
Rev., 38 (1996), pp. 427-482. 153 

5. R. H. Chan and G. Strang, “Toeplitz equations by conjugate gradients with circu- 
lant preconditioner”, SIAM J. Sci. Stat. Comp., 10 (1989), pp. 104-119. 153 

6. T. F. Chan, “An optimal circulant preconditioner for Toeplitz systems”, SIAM J. 
Sci. Stat. Comp., 9 (1988), pp. 766-771. 153 

7. F. Di Benedetto and S. Serra Capizzano, “A unifying approach to abstract matrix 
algebra preconditioning” , Numer. Math., 82-1 (1999), pp. 117-142. 154 

8. G. Fiorentino and S. Serra, “Multigrid methods for Toeplitz matrices”, Calcolo, 
28 (1991), pp. 283-305. 155, 156 

9. G. Fiorentino and S. Serra, “Multigrid methods for symmetric positive definite 
block Toeplitz matrices with nonnegative generating functions”, SIAM J. Sci. 
Comp., 17-4 (1996), pp. 1068-1081. 155, 156 

10. W. Hackbusch, Multigrid Methods and Applications. Springer Verlag, Berlin, 1985. 
154 

11. S. Holmgren and K. Otto, “Iterative solution methods and preconditioners for 
block tridiagonal systems of equations”, SIAM J. Matrix Anal. AppL, 13 (1992), 
pp. 863-886. 153 

12. S. Holmgren and K. Otto, “Semicirculant preconditioners for first order partial 
differential equations”, SIAM J. Sci. Comput., 15 (1994), pp. 385-407. 153 

13. T. Huckle, “Multigrid preconditioning and Toeplitz matrices”, private communi- 
cation. 156 

14. X. Q. Jin and R. Chan, “Circulant preconditioners for second order hyperbolic 
equations”, BIT, 32 (1992), pp. 650-664. 153 

15. I. Lirkov, S. Margenov and P. Vassilevsky, “Circulant block factorization for elliptic 
problems”. Computing, 53 (1994), pp. 59-74. 153 

16. S. Serra Capizzano and C. Tablino Possio, “Spectral and structural analysis of high 
precision Finite Difference matrices for Elliptic Operators”, Linear Algebra AppL, 
293 (1999), pp. 85-131. 153 

17. S. Serra, “A Korovkin - type Theory for finite Toeplitz operators via matrix alge- 
bras”, Numer. Math., 82-1 (1999), pp. 117-142. 153 

18. S. Serra Capizzano and C. Tablino Possio, “Multigrid methods for multilevel cir- 
culant matrices”, manuscript, (2000). 152, 154, 158 

19. E. Tyrtyshnikov, “Circulant preconditioners with unbounded inverses”, Linear Al- 
gebra AppL, 216 (1995), pp. 1-23. 154 

20. E. Tyrtyshnikov, “A unifying approach to some old and new theorems on distri- 
bution and clustering”, Linear Algebra AppL, 232 (1996), pp. 1-43. 153 



Computing the Inverse Matrix Hyperbolic Sine* 



J. R. Cardoso^ and F. Silva Leite^ 

^ Institute Superior de Engenharia de Coimbra, Quinta da Nora 
3030 Coimbra, Portugal 
jocarOsun. isec.pt 

^ Departamento de Matematica, Universidade de Coimbra 
3000 Coimbra, Portugal 
f leiteSmat .uc.pt 



Abstract. We give necessary and sufficient conditions for solvability 
of the matrix equation sinhX = A in the complex and real cases and 
present some algorithms for computing one of these solutions. The nu- 
merical features of the algorithms are analysed along with some numer- 
ical tests. 

Keywords: primary matrix function, inverse matrix hyperbolic sine, 
matrix exponentials, logarithms and square roots, Pade approximants 



1 Introduction 

The matrix hyperbolic sine of a real or complex square matrix X is defined by 
sinh X := (e^ — e~^) 12. Inversely, if A is given, we call inverse matrix hyperbolic 
sine of A to any solution of the matrix equation sinhX = A, which is denoted by 
sinh~^ A. Since the matrix hyperbolic sine is based on matrix exponentials, it is 
a primary matrix function ([9], ch. 6). Properties of such functions are the key for 
obtaining conditions under which the matrix equation sinh X = A has solutions, 
in a way which is similar to the matrix equation = A, whose solutions are 
logarithms. The Jordan canonical form also plays an important role for analysing 
theoretical aspects of the inverse matrix hyperbolic sine, as will be shown later. 

The problem of computing solutions of sinh X = A, when A is real, is anal- 
ysed through two algorithms which are a result of careful manipulations of algo- 
rithms for computing matrix logarithms. For the special case when the matrix A 
is P-symmetric, one of them is structure preserving. The general algorithms 
work under a restriction on the spectrum of A. Since skew-symmetric matrices 
may not fit this assumption, they require different treatment. 

The problem of computing matrix logarithms has received particular atten- 
tion recently ([2,3,4,10,11] and [12]). However, as far as we know, the inverse 
matrix hyperbolic (or trigonometric) functions have not deserved the same in- 
terest. 

* Work supported in part by ISR and research network contract ERB EMRXCT- 
970137. 
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This paper deals only with inverse matrix hyperbolic sines but a similar study 
may be done for other inverse matrix functions such as cosh”^, sin~^ and cos“^. 
Our interest on the inverse matrix hyperbolic sine was motivated by the work 
of Crouch and Bloch [1], where the matrix equation XQ^ — QX^ = M appears 
associated with the generalized rigid body equations. In this case Q is orthogonal 
and M is skew-symmetric. It turns out that X = (e®™*' ~)Q is a solution of 

that matrix equation. 

This paper is organized as follows. In section 2, we present some hyperbolic 
and trigonometric primary matrix functions along with some of their properties. 
In section 3 we give necessary and sufficient conditions for a complex (resp. real) 
matrix to have a complex (resp. real) inverse matrix hyperbolic sine. We also 
make some considerations about the principal inverse matrix hyperbolic sine and 
present some algorithms in section 4. Finally, comments and examples on the 
implementations of the algorithms are given in section 5. 

2 Some Hyperbolic and Trigononetric Matrix Functions 

Using the corresponding scalar expressions as in the hyperbolic sine case, we may 
also define another primary matrix functions such as cosh := (e^ -I- e~^)/2, 
sinX := — e~'^^)j2i and cosX := -I- e“*^)/2, where X is any real or 

complex square matrix. Some of the identities holding in the scalar case extend 
to the matrix case under the assumption on that the matrices commute. For 
example, if XY = YX, it is easy to prove that 

sinh(X ± y ) = sinh X cosh Y ± cosh X sinh Y ; 
cosh(X ± y ) = cosh X cosh Y ± sinh X sinh Y ; 
sin(X ± y ) = sin X cos Y ± cos X sin Y ; 
cos(X ± y ) = cos X cos y =F sin X sin Y. 

Setting X = y in the identities above, it is straightforward that 

cosh^ X — sinh^ X = I; 
sin^ X + cos^ X = I; 

sinh(2X) = 2 sinh X cosh X; 
sin(2X) = 2 sin X cos X. 

When X is a P- symmetric or a P-skew-symmetric matrix (i.e., X^P = PX 
or X^P = —PX, with X, P real and P nonsingular, respectively), we are 
particulary interested in studying the structure of the image of these matrices 
by the matrix functions defined above. We note that for particular case when 
P = 7, we get the symmetric and the skew-symmetric matrices, respectively. 

Using the definitions, it is easy to prove that the image of a P-symmetric 
matrix by any primary matrix function is still P-symmetric and that the im- 
age of a P-skew-symmetric matrix by the hyperbolic or trigonometric sine is 
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also P-skew-symmetric. However, the image of a P-skew-symmetric matrix by 
the hyperbolic or trigonometric cossines is P-symmetric. 

3 The Equation sinh X = A 

We start with some remarks about the scalar equation sinh x = a, where a,x € C. 
Given any a S C, this equation has always an infinity of solutions. In fact, 

X = log[a + (a^ + and x + 2kni {k G 2) 

satisfy the equation sinhx = a. If 

a ^ £ = {ai : a G IR, |o;| > I}, 

then sinhx = a has a unique solution lying on the horizontal open strip in the 
complex plane defined by 

P = {x G C : < Im(x) < ^}. 

The key idea to show this fact is to observe that the real part of a+{a^ + 1)^/^ , 
where denotes the complex square root that lies on the open right half 
plane, is positive and the real part of a — (a^ + 1)^/^ is negative. The remain 
of the proof is immediate. 

As a consequence of this fact, we have the following result. 

Lemma 1. If A has no eigenvalues in £ then there exists an unique inverse 
matrix hyperbolic sine of A with eigenvalues in V. 

This inverse matrix hyperbolic sine is called the principal and is denoted by 
Sinh” ^ A (with capital case). 

Contrary to the scalar case, the matrix equation sinh A = A may not have 
solutions. The following theorem gives a result to decide what conditions A must 
satisfy in order that this equation has a solution. 

Theorem 1. If A is a complex square matrix then sinh A = A has some solu- 
tion in C if and only if the Jordan blocks of A with size > 2 associated to the 
eigenvalues i and —i occur in pairs of the form 

^fc(A), Jfc(A) 



or 

'^/c(A), i(A), 

where A G {—i,i\ and Jp{\) denotes the Jordan block of A associated to the 
eigenvalue A with size p. 

If i or —i are eigenvalues of A, then no inverse matrix hyperbolic sine is a 
primary matrix function of A. 
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Proof. First we prove the necessary condition. If there exists X such that 
sinhX = A, then the eigenvalues of A are hyperbolic sines of eigenvalues of X. 
If A S {— t} is an eigenvalue of A, then the corresponding eigenvalues in X are 
of the form ^ 

{2p± -)TTi, p € Z. 

Let be a Jordan block of X associated to p. Applying a result from [9], p. 
424, we may conclude that the Jordan decomposition of sinh[ gives rise to 
a pair of blocks of the form 

>/i/ 2 (A), J;/ 2 (A), if I is even, 



or 

Ji+i(X). Ji-i (A), if I is odd. 

Now we prove the sufficient condition. Decomposing A in its Jordan canonical 
form, we may write A = S diag(Ai,A 2 ) S~^, where S is nonsingular, Ai is a 
direct sum of Jordan blocks with eigenvalues not lying on {— i,i} and A 2 is 
a direct sum (with an even number) of Jordan blocks associated to i or —i 
which have the form described above. To get sinh~^A , it is enough to find 
sinh”^ Ai and sinh”^ A 2 . Using some results about primary matrix functions 
([9], ch.6), there exists at least an inverse matrix hyperbolic sine sinh”^ Ai 
which is a primary matrix function of Ai. To show that there exists sinh”^ A 2 
we may suppose, without loss of generality, that A 2 is a pair of blocks of the 
form Jk(i) © Jfc(«). Decomposing sinh( J2fc[(2p + ^)Tri]) in its Jordan canonical 
form, we have 

sinh(J2fe[(2p+ ^)7ri]) = T[Jk{i) © 

for some nonsingular T, which implies that 

smh~^[Jk{i) © Jk{i)] = J2fe[(2p+ i)7ri] T. 

This proves the sufficient condition. If A has eigenvalues i or —i then no X such 
that sinh X = A can be a primary matrix function of A. In fact, if there exists 
a such function / satisfying 

^ = f{A), 

then / can not transform two Jordan blocks into one. 



The inverse matrix hyperbolic sine of A is not always a primary matrix 
function of A. However, if the spectrum of A, cr(A), does not contain ±«, there 
are some inverse matrix hyperbolic sines which are primary matrix functions 
of A, as guaranteed by the theorem. This type of functions enjoys some specials 
properties since they can be written as a polynomial in A. In particular, they 
commute with A. 
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The next collorary concerns the analysis of the matrix equation sinh X = A 
in the real case. Given a real matrix A, we want to know conditions under 
which this equation has a real solution. Before stating the result, we note that 
the nonreal eigenvalues of a real matrix occur in conjugate pairs. The number 
and the size of the Jordan blocks associated to a nonreal eigenvalue and to its 
conjugate are the same. 

Corollary 1. If A is a real square matrix then sinh = A has some real so- 
lution if and only if the Jordan blocks of A with size > 2 associated to the 
eigenvalue i occur in pairs of the form 

J k (^) 1 



or 

If i is an eigenvalue of A (—i is also an eigenvalue) then no inverse matrix 
hyperbolic sine of A is a primary matrix function of A. 

Proof. The necessary condition is a consequence of the previous theorem. To 
prove the sufficient condition, we consider the scalar complex function 

f Log(x + Vx^ + l ), a;enp=i-Bp 

f{x) = < Log(x - + 1), x € fj^i C'? > 

[ Log(x + (x^ + 1)^^^), otherwise 

where 

— Log w denotes the principal logarithm; 

— y/w denotes the square root of w which lies on the open left half plane; 

— denotes the principal square root of w, 

— Ai, • • • , A; are the eigenvalues of A of the form ai, for some a > 1; 

— /xi, • • • , fim are the eigenvalues of A of the form —ai, for some a > 1; 

— r = min^^x^ |A — /x|, A and p, are eigenvalues of A; 

— Bp := {x G € : \x - Xp\ < r/2}, p = 1, • • • , /; 

— Cq := {x G C : |x - < r/2}, q = I, - ■ ■ ,m. 

If i is not an eigenvalue of A, f is defined and has derivatives of any order for 
each X G o'(A). Thus f{A) is a primary matrix function and it is not hard to 
prove that it is real. If i is an eigenvalue and the associated Jordan blocks occur 
in pairs as described above, then the equation sinhX = A has a real solution. 
This follows from the proof of the previous theorem. To conclude the proof, it is 
enough to use a similar argument as that in the proof of the theorem to conclude 
that there is not any inverse matrix hyperbolic sine which is a primary matrix 
function of A. 



Since for any a, x G C, we have sinh x = a<t4>e“ = a± (a^ + 1) we may define 
an infinity of inverse hyperbolic sine functions. It depends on the branch we take 
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for logarithm and for square root. Using the notations above, we may define the 
principal inverse hyperbolic sine as 

Sinh“^a = Log[a + (a^ + 1)^/^], a G C. 

Since this function is differentiable if + 1 ^ IRq , that is, a ^ £ (see the 
beginning of this section), we may define the corresponding primary matrix 
function as 

Sinh"^A = Log[A + (^2 1 ) 1 / 2 ]^ 

where A is a complex matrix such that cr(A) n £ = </>, with a{A) denoting the 
spectrum of A. The eigenvalues of Sinh“^Gl lie on the strip V and if A is real, 
then Sinh“^A is also real. Moreover, if (j{A) H £ = (f> then sinh(Sinh“^A) = A 
and if cr(A) C T> then Sinh“^(sinh A) = A. 

We saw in section 2 that the hyperbolic sine of a P-skew-symmetric matrix 
is also P-skew-symmetric. In the following theorem we show that the opposite 
is true for the principal inverse matrix hyperbolic sine. 

Theorem 2. If A is a P -skew- symmetric matrix and a(A) n £ = (f, then 
Sinh~^A is also a P -skew -symmetric matrix. 

Proof. It is enough to show that B = A -\- {A^ + is P-orthogonal 

(i.e., B^PB = P) and use the fact that the principal matrix logarithm of a P- 
orthogonal matrix is P-skew-symmetric [2]. 

4 Algorithms 

The main algorithm to be presented here (algorithm 1) for computing the princi- 
pal inverse matrix hyperbolic sine involves the computation of matrix logarithms 
and matrix square roots. It is well known that one of the most suitable method 
for computing principal matrix square roots involves the Schur decomposition 
([6,8]) and has been implemented in Matlab (version 5.2). An alternative method 
for the same purpose is the Denmam & Beavers iterative method in [7] . For the 
matrix logarithm there are some methods proposed in the literature but there 
has not been agreement in choosing the most suitable. See [3] for a comparai- 
son among the methods and [12] for a new method. Here we use the so called 
Briggs-Pade method which combines an inverse squaring and scaling procedure 
with Fade approximants. The usual form of this method involves diagonal Fade 
approximants of the function log(l — x). However, in [2], we presented an im- 
proved algorithm for this method which instead uses diagonal Fade approximants 
of the function 

-\- X \ 1—1 

log(:; ) = 2tanh x. 

1 — X 

There, we showed that these approximants are well conditioned with respect to 
matrix inversion and its use reduces the number of matrix square roots needed 
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in the inverse squaring and scaling procedure. This reduction is important since 
it increases the accuracy in the resulting approximation. 



Algorithm 1 

This algorithm computes Sinh“^A, when A is real and a{A) C\ E = 4>. 
e is a given tolerance. 

1. Find the real Schur decomposition of A, 

A = QRQ^, 



where Q is orthogonal and R is block upper triangular; 

2. Set T := R+ {R? + where the matrix square roots may be computed 
by the function sqrtm of Matlab; 

3. Set B, := - /)(T^ + I)-^ and u, := 2||B,|| [if(||i3|||) - 

fm_pm(||S|||)], j e IN, where 

1 1 _|_ 0 - 1/2 1 






and S 2 m, 2 m{x) is the (2m, 2m) diagonal Fade approximant of 2tanh ^ x; 

4. Compute k successive square roots of T until ||S^|| < 1 and Uk < e; 

5. Compute S 2 m, 2 m{Bk)\ 

6. Approximate LogT using the relations 



Log(T) = 2"^'Log(T3t) « 2'^S2m,2m{Bk). 

7. Set Sinh"^A = Q(Log T)Q^. 



The most expensive step in the algorithm is the first. The cost of computing 
the real Schur decomposition is about 25n^ flops ([5], 7.5.6). After this step all 
the matrices involved are block triangular and, in this case, the cost of taking one 
matrix square root is about flops [8]. To guarantee full precision in Matlab 
(with relative machine epsilon e « 2.2 x 10“^®), a good compromise between 
taking many square roots and increase the order of Fade approximants is to use 
the S'ss Fade approximant. 

When ||i?^|| < 1, the nonegative real number Uk in step 4 measures the 
approximation computed for the logarithm by the diagonal Fade approxi- 
mant S2m,2m{Bk), since 

||Log[(J+ B)(I - B)-^] - S2m,2miB)\\ < Uk- 

The computation of S 2 m, 2 m{Bk) = P 2 m{Bk)[Q 2 m{Bk)]~^ , where P 2 m 
and Q 2 m are polynomials of degree at most 2m, needs about (2r -|- s — 2)^ 
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flops if ST = 2m, with s = \^2rri\ , r = \2m/s\ and (2r + s)^ flops other- 
wise [3]. 

We note that in the last step Sinh“^A = QLog [i? -b (i?^ -f 

T 

If in algorithm 1 we omit step 1 and compute the matrix square root using 
the Denman & Beavers method, we obtain a new algorithm, say algorithm 2, 
which also computes Sinh~^A whenever a(A) n£ = (f>. One of the advantages of 
using this new algorithm, instead of the first one, is that it is structure preserving 
(in exact arithmetic) for all P-symmetric matrices that satisfy the spectral as- 
sumption. This is due to the fact that the iterative method of Denman & Beavers 
involves only inverses and sums, which preserve P-symmetry, and the diagonal 
Bade approximants used in the logarithm also preserve this kind of structure [2]. 



Since the spectrum of skew-symmetric matrices is purely imaginary, they 
may not satisfy the condition a{A) n £ = <(). For these matrices we propose a 
different algorithm to compute Sinh~^A. 

Algorithm 3 

A is any real skew-symmetric matrix. 

1. Find the real Schur form of A 

A = QDQ'^, 

where Q is orthogonal, D = diag(0, • • • , 0, Ai, • • • , A;), and A^ = 
„ , ttfc > 0, fc = 1, • • • ,L 



2. Set Sinh ^A=Q diag(0, • • • , 0, Ai, • • • , Xi) Q'^ , where, for all fc = 1, • • • , Z, 



Xk={ 



ln[ofc -b (Ofc - 1)^/^] 7 t/2 

-7t/ 2 ln[afc -b (a^ - 1)^/2] 



— cos — 



cos ^(1 — 



, if Ofc > 1 

, if 0 < < 1 



Remark. When A is skew-symmetric and does not satisfy a{A)n£ = </>, we have 
no guarantee that the inverse matrix hyperbolic sine of A is skew-symmetric. A 
necessary and sufficient condition for sinh A = A to have a skew-symmetric 
solution is that the eigenvalues of A are of the form ai, with |o;| < 1. To prove 
the necessary condition, we suppose that A is a skew-symmetric matrix such 
that sinh A = A. Then the eigenvalues of A are of the form ±/3f, /3 S IR, and 
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the eigenvalues of A are hyperbolic sines of eigenvalues of X, that is, they are 
of the form 

sinh(±/3i) = ±isin/3, 

where |o;| = | sin/3| < 1. The sufficient condition is an immediate consequence of 
facts discussed previously. 

5 Numerical Experiments 

We have implemented the algorithms 1 and 2 in Matlab (with relative machine 
epsilon e « 2.2 x 10”^®) on a Pentium II. We used the Frobenius norm, (8, 8) Fade 
approximants and a tolerance of £ = ||xl|| x 10“^®. The expressions for Sss(x) 
and Uj are: 

^ -2x(15159a;® - 147455x‘‘ + 345345x2 - 225225) 

“ 35(35x8 _ i260x® + 6930x4 _ i2012x2 + 6435) ’ 

u, = 2\\B,\\ m\B]\\)-ts,{\\B]\\)], 

where 

1 1 J_ 1 



In order to measure the relative error of the computed inverse hyperbolic 
sine X, we used the quantity 



error = 



II sinh — A|| 

PI 



where the hyperbolic sine was computed using the function expm of Matlab. 

We tested several Hilbert matrices of orders n < 15, which have a large 
condition number. In both algorithms the relative error varied between 10“^® 
and 10”®"*. We also tested the matrix 



0.0001 0.9999 lO"* O.OOOl' 
-0.9999 0.0001 -10® 10® 

0 0 0.0001 1.0001 ’ 
0 0 - 1.0001 0.0001 



which has a large condition number, cond(H) = 1.0101 x 10 ^ 2 ^ and eigenvalues 
close to i and —i. In this case, we noticed a loss of accuracy of about 8 significant 
digits. The relative error was about 5.6052 x 10“® in both algorithms. This result 
was somewhat expected since the computation of {R^ + 1)^/2 jn step 2 has lost 
about six significant digits of accuracy. 



To study the behaviour of both algorithms in which concerns to structure 
preserving of P-symmetric matrices, we considered several particular examples 
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with P = 



0 

-Ik 



with 2k = n, and P = 



Ip 0 
0 -In 



with p + q = 



of 



orders 6, 7 and 8. In both algorithms we observed that for matrices with nonlarge 
condition number the original structure was preserved, althoug the inverse was 
computed by a method that does not preserve such structure. For matrices with 
large condition number, our tests showed that algorithm 2 is slightly better for 
the first choice of P and that both algorithms rarely preserved the structure for 
the second choice of P. 

Based on examples tested, we observe that a reduction in accuracy may occur 
when A has a large condition number. 
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Abstract. We consider different preconditioning techniques of both im- 
plicit and explicit form in connection with Krylov methods for the solu- 
tion of large dense complex symmetric non-Hermitian systems of equa- 
tions arising in computational electromagnetics. We emphasize in par- 
ticular sparse approximate inverse techniques that use a static nonzero 
pattern selection. By exploiting geometric information from the underly- 
ing meshes, a very sparse but effective preconditioner can be computed. 

In particular our strategies are applicable when fast multipole methods 
are used for the matrix- vector products on parallel distributed memory 
computers. 

Keywords; Preconditioning techniques, sparse approximate inverses, 
nonzero pattern selection strategies, electromagnetic scattering applica- 
tions. 

AMS subject classification: 65F10, 65F50, 65N38, 65R20, 78A45, 

78A50, 78-08 

1 Introduction 

A considerable amount of work has been recently spent on the simulation of 
electromagnetic wave propagation phenomena, addressing various topics ranging 
from radar cross section to electromagnetic compatibility, absorbing materials, 
and antenna design. The physical issue is to compute the diffraction pattern 
of the scattered wave, given an incident field and a scattering obstacle. The 
Boundary Element Method {BEM) is a reliable alternative to more classical dis- 
cretization schemes like Finite Element Methods and Finite Difference Methods 
for the numerical solution of this class of problems. The idea of BEM is to 
shift the focus from solving a partial differential equation defined on a closed or 
unbounded domain to solving a boundary integral equation only over the finite 
part of the boundary. This approach leads to the solution of linear systems of 
equations of the form 

Ax = 6, (1) 
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where the coefficient matrix A = [a^] is a large, dense, complex matrix of or- 
der n arising from the discretization. The coefficient matrix can be symmetric 
non-Hermitian in the EFIE (Electric Field Integral Equation) formulation, or 
imsymmetric in the CFIE (Combined Field Integral Equation) formulation. Di- 
rect dense methods based on Gaussian elimination are often the method of choice 
for solving such systems, because they are reliable and predictable both in terms 
of accuracy and cost. However, for large-scale problems they become impractical 
even on large parallel platforms because they require storage of double preci- 
sion complex entries of the coefficient matrix and 0{n^) floating-point operations 
to compute the factorization. Iterative Krylov subspace based solvers can be a 
promising alternative provided we have fast matrix-vector multiplications and 
robust preconditioners. Here we focus on the design of robust preconditioning 
techniques. The paper is organized as follows: in Section 1 we introduce the prob- 
lem and we discuss some issues addressed by the design of the preconditioner 
for this class of problems; in Section 2 we report on the results of our numerical 
investigations and finally, in Section 3, we propose a few tentative conclusions 
arising from the work. 

1.1 The Design of the Preconditioner 

A preconditioner M is any matrix that can accelerate the convergence of it- 
erative solvers. The original system (1) is replaced with a new system of the 
form M~^Ax = M~^b when preconditioning from the left, and AM~^y = b , 
with X = M~^y, when preconditioning from the right. A good preconditioner, to 
be effective, has to be a close approximation of A, easy to construct and cheap 
to store and to apply. For dense matrices, some additional constraints have to 
be considered. The choice of the best preconditioning family can require more 
effort than in the sparse case, because for dense systems there are far less results. 
When the coefficient matrix of the linear system is dense, the construction of 
even a very sparse preconditioner may become too expensive in execution time 
as the problem size increases. In some context like in the multipole setting all the 
entries of the coefficient matrix are not directly available and the preconditioner 
has to be constructed from a sparse approximation of A, possibly computed by 
accessing only local information. Thus a suitable pattern is required to select a 
representative set of the entries of A to build M. 

The parallel issue suggests to consider also preconditioning techniques of ex- 
plicit form that compute an approximation to the inverse of A, because then 
the application of the preconditioner reduces to perform at each step a M-V 
product, which is a highly parallelizable kernel on both shared and distributed 
memory machines. Some of these techniques require to prescribe a sparse pat- 
tern in advance for the approximate inverse, able to capture most of the large 
entries of A~^. Thus in that case an effective pattern is required also for the 
preconditioner. 

Algebraic Strategy. In the BEM context the matrices arising from the dis- 
cretization of the problem exhibit regular structure: the largest entries are lo- 
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cated on the main diagonal, and only a few adjacent bands have entries of 
high magnitude. Most of remaining entries have much smaller modulus. In Fig- 
ure 1(a), we plot for a cylindric geometry the matrix obtained by scaling A = [oy] 
so that maxij la^j = 1, and discarding from A all entries less than e = 0.05 in 
modulus. This matrix has 16 non-zeros per row on the average and its size is 
1080. Several heuristics based on algebraic information can be used to extract 
a sparsity pattern from A that retains the main contributions to the singular 
integrals [2]. 





Fig. 1. Nonzero pattern for A (left) and A ^ (right) when the smallest entries 
are discarded. The test problem is a cylinder 



On smooth geometries, due to the decay of the Green’s function, the regular 
structure of A is generally maintained also for its inverse. Figure 1(b) shows 
the pattern of sparsified{A~^), where A~^ has been computed using LAPACK 
library routines, and then sparsified, after scaling, with the same value of the 
threshold as the one used to produce Figure 1(a). This pattern selection strategy, 
referred to as the algebraic strategy, can be effective to construct preconditioners 
of both implicit and explicit form, but requires to access all the entries of the 
coefficient matrix and for large problems this can become too expensive or even 
not possible, like in a multipole framework. 

Relevant information for the construction of the preconditioner can be ex- 
tracted from the meshes of the underlying physical problem. In particular, two 
types of information are directly available: 

the connectivity graph, describing the topological neighbourhood amongst the 
edges, and 

the coordinates of the nodes in the mesh, describing geometric neighbourhoods 
amongst the edges. 
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Topological Strategy. In the integral equation context that we consider here, 
the object surface is discretized by a triangular mesh using the so-called flux flnite 
elements or Rao-Wilton-Glisson elements [7]. Each degree of freedom (DOF), 
corresponding to an unknown in the linear system, represents the vectorial flux 
across each edge in the triangular network. Topological neighbourhoods can be 
deflned according to the concept of level k neighbours, as introduced in [6]. 
Level 1 neighbours of a DOF are the DOF plus the four DOFs belonging to 
the two triangles that share the edge corresponding to the DOF itself. Level 2 
neighbours are all the level 1 neighbours plus the DOFs in the triangles that are 
neighbours of the two triangles considered at level 1, and so forth. In Figure 2 
we plot, for each DOF of the mesh for the same cylindric geometry considered 
before, the magnitude of the associated entry in A (the graph on the left) and 
in A~^ (the graph on the right) with respect to the level of its neighbours. In 
both cases the large entries derive from the interaction of a very localized set of 
edges in the mesh so that by retaining a few levels of neighbours for each DOF 
an effective pattern to approximate both A and A~^ is likely to be constructed. 
A pattern selection strategy based on topological information is referred to as 
topological strategy. 









In.(A) 





(a) Magnitude v.s. levels for A (b) Magnitude v.s. levels for A ^ 

Fig. 2. Topological localization in the mesh for the large entries of A (left) 
and A~^ (right). The test problem is a cylinder and is representative of the 
general behaviour 



Geometric Strategy. For the same scattering problem previously considered, 
we plot in Figure 3, for each pairs of edges in the mesh, the magnitude of 
their associated entries in A and A~^ with respect to their distance in terms 
of the wavelength of an incident electromagnetic radiation. The wavelength is 
a physical parameter affecting the complexity of the problem to be solved. For 
an accurate representation of the oscillating solution of Maxwell’s equations, in 
fact, around ten points per wavelength need to be used for the discretization. 
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The largest entries of A and A~^ are strongly localized in a similar fashion. 
The pattern for constructing an approximation of A or A~^ can be computed 
by selecting for each edge all those edges within a sufficiently large sphere that 
defines our geometric neighbourhood. In the case of preconditioning techniques 
of explicit form, by using a suitable size for this sphere we hope to include the 
most relevant contributions to the inverse and consequently to obtain an effective 
sparse approximate inverse. When the surface of the object is very non-smooth, 
these large entries may come from the interaction of far-away or non-connected 
edges in a topological sense, which are neighbours in a geometric sense. Thus 
this approach is more promising to handle complex geometries where parts of 
the surface are not connected. This selection strategy will be referred to as the 
geometric strategy. 






l„v(A) 




(a) Magnitude v.s. distance for A (b) Magnitude v.s. distance for A ^ 

Fig. 3. Geometric localization in the mesh for the large entries of A (left) 
and A~^ (right). The test problem is a cylinder. This is representative of the 
general behaviour 



2 Numerical Experiments 

In this section we compare the performance of different preconditioning tech- 
niques in connection with Krylov solvers on a selected set of test problems. 
Amongst the test cases considered in [2], we select the three following examples, 
corresponding to bodies with different geometries: 

Example 1: Cylinder with a hollow inside, a matrix of order n = 1080; 
Example 2: Cylinder with a break on the surface, a matrix of order n = 1299; 
Example 3: Sphere, a matrix of order n = 1080, 

where, for physical consistency, we set the frequency of the wave so that there 
are about ten discretization points per wavelength. 
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We use, amongst Krylov methods, restarted GMRES [8], Bi-CGSTAB [9], sym- 
metric and nonsymmetric QMR [4], TFQMR [3]. We consider the following pre- 
conditioning techniques, all computed by replacing A with its sparse approxi- 
mation referred to as sparsified{A), and implemented as right preconditioners: 

— SSOR; 

— ILU{0), the incomplete LU factorization with zero level of fill-in, applied to 
sparsified(A); 

— FROB, a Frobenius norm minimization technique, with the pattern of spar- 
sified(A) prescribed in advance for the approximate inverse; 

— SPAI, introduced in [5], with the adaptive strategy implemented in the 
MI 12 routine from HSL; 

— SLU, a complete LU factorization of sparsified(A), used as implicit precon- 
ditioner. 

For comparison purpose, we also report on results for the unpreconditioned case 
and using a simple diagonal scaling. The stopping criteria in all cases consists in 
reducing the original residual by 10“®. The symbol means that convergence is 
not obtained after 500 iterations. In each case, we take as the initial guess xq = 0, 
and the right-hand side is such that the exact solution of the system is formed by 
all ones. All the numerical experiments refer to runs in double precision complex 
arithmetic on a SUN workstation. 

The pattern to construct sparsified(A) and all the preconditioners are com- 
puted by using the geometric strategy, retaining all those entries within a sphere 
of radius 0.12 times the wavelength. We try to have the same number of nonze- 
ros in the different preconditioners resulting from the various methods: in the 
incomplete LU factorization, no additional level of fill-in is allowed in the fac- 
tors; in the Frobenius-norm minimization technique, the same sparsity pattern 
prescribed on A (and then exactly the same number of nonzero entries) is im- 
posed on the preconditioner; with SPAI we choose a priori, for each column 
of M, the same fixed maximum number of nonzeros as in the computation of 
sparsified(A); and finally for the SLU method, sparsified(A) is factorized us- 
ing MF47, a sparse direct solver from HSL, and those exact factors are used as 
the preconditioner. The efficient implementation of the MF47 solver guarantees 
a minimal fill-in in the factors. We do not report on results with the AINV 
preconditioner [1] because they are discouraging. 

Amongst different techniques Frobenius norm minimization methods are the 
most promising; they are highly parallelizable and numerically effective. The 
L-S solutions require some computational effort, but the patterns computed by 
the geometric strategy are generally very sparse, and the resulting least squares 
problems are small and can be effectively computed via a dense QR factorization. 
As it can be seen in Table 1, ILU preconditioners are not effective for such 
systems. In our tests, modifications of the coefficient matrix do not help to 
improve their robustness. Better performance can be obtained by allowing more 
fill-in in the factors but at the cost of increased computational cost and storage 
requirement. The SLU preconditioner represents in this sense an extreme case 
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Table 1. Number of iterations required by different preconditioned Krylov 
solvers to reduce the residual by 10“® 



1 Example 1 - Density of M = 5.03% | 


Precond. 


GMRES(m) 


Bi - 

CGStab 


UQMR 


TFQMR 


m=10 


m=30 


m=50 


m=80 


m=110 


U nprec 


- 


- 


- 


251 


202 


293 


258 


170 


Mj 


- 


- 


465 


222 


174 


239 


210 


169 


SSOR 


- 


417 


199 


137 


101 


116 


154 


126 


ILU{0) 


- 


- 


- 


- 


- 


- 


- 


- 


FROB 


134 


83 


49 


49 


49 


53 


57 


47 


SPAI 


- 


- 


- 


- 


- 


- 


340 


465 


SLU 


- 


- 


377 


223 


178 


236 


244 


265 


1 Example 2 - Density of M = 1.59% | 


Precond. 


GMRES(m) 


Bi - 

CGStab 


UQMR 


TFQMR 


m=10 


m=30 


m=50 


m=80 


m=110 


U nprec 


- 


- 


- 


398 


289 


321 


405 


251 


Mj 


- 


- 


473 


330 


243 


257 


354 


228 


SSOR 


- 


363 


236 


157 


126 


153 


246 


136 


ILU{0) 


- 


- 


- 


160 


97 


- 


273 


437 


FROB 


114 


88 


68 


57 


57 


45 


85 


46 


SPAI 


- 


- 


- 


- 


- 


- 


- 


- 


SLU 


- 


- 


- 


318 


206 


412 


499 


- 


1 Example 3 - Density of M = 1.50% | 


Precond. 


GMRES(m) 


Bi - 

CGStab 


UQMR 


TFQMR 


m=10 


m=30 


m=50 


m=80 


m=110 


U nprec 


202 


62 


61 


57 


57 


75 


69 


40 


Mj 


175 


71 


67 


59 


59 


80 


71 


46 


SSOR 


176 


87 


77 


63 


63 


82 


80 


55 


ILU{0) 


- 


- 


- 


470 


330 


- 


284 


217 


FROB 


15 


14 


14 


14 


14 


10 


19 


10 


SPAI 


- 


- 


- 


- 


- 


- 


- 


- 


SLU 


385 


143 


107 


74 


74 


73 


95 


68 



with respect to ILU{0) since a complete fill-in is allowed in the factors. This 
approach, although not easily parallelizable, is generally effective on this class 
of applications for dense enough sparse approximations of A. But if the pattern 
is very sparse, approximate inverse techniques prove to be more robust. SSOR, 
compared to FROB, is generally slower in term of iterations, but is very cheap to 
compute. However it is not easily parallelizable, and the extra-cost for computing 
an approximate inverse can be overcome by the time saved in the iterations when 
solving the same linear system with many right-hand sides. This is often the case 
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in electromagnetic applications, when illuminating an object with various waves 
corresponding to different angles of incidence. 

3 Conclusions 

Iterative methods can present an attractive alternative to direct methods even for 
the solution of this class of problems, especially when great accuracy for the so- 
lution is not demanded, as is often the case for physical problems. The behaviour 
of these techniques is strongly dependent on the choice of the preconditioner. 
Frobenius norm minimization methods are the most promising candidates to 
precondition effectively these problems; they deliver a good rate of convergence, 
and are inherently parallel. The numerical experiments have shown that, using 
additional geometric information from the underlying mesh, we can compute a 
very sparse but effective preconditioner. This pattern selection strategy does not 
require access to all the entries of the matrix A, so that it is promising for an 
implementation in a fast multipole setting where A is not directly available but 
where only the near field entries are computed. 
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Abstract. The limbic circuit, involving the prefrontal cortex, hippocam- 
pus and certain subcortical structures plays a determinant role in the 
emotional activity and for understand psychopathologies like schizophre- 
nia and sensitization to certain psychostimulants. 

In this work, we constructed a non-linear network representing the in- 
teraction between seven important nuclei of the limbic system. 

This model is a first approach that allows to simulate different activi- 
ties of the circuit, associated with the dopamine sensitization and the 
neurodevelopmental hypothesis for the neuropathology of schizophrenia 



1 Introduction 

Pathophysiological processes that underlie the profound neuropsychiatric dis- 
turbances in Schizophrenia are poorly understood^. However, considerable evi- 
dence from clinical, neuropsychological brain image and postmortem anatomical 
studies strongly implicates the prefront al-temporo-limbic cortical dysfunctions 
in schizophrenia^’®. 

Physiologically speaking, the dopamine and the neurodevelopmental hypothe- 
ses of schizophrenia postulate that at least some forms of schizophrenia could 
have their origin in an early neurodevelopmental defect (damage in the ventral 
hippocampus and/or the prefrontal cortex) that may result in the prefrontal - 
temporo-limbic cortical dysfunctions (recovered lesion) in early adulthood (over- 
activity in neurotransmission from DA cell bodies, located in the ventral tegmen- 
tal area (VTA) of the midbrain, hypofunction of the prefrontal cortex)^’^’®. 

We model a network composed by seven nuclei from the limbic circuit which 
is a prefrontal-temporolimbic cortical circuit involved in the pathology of schizo- 
phrenia^’'^’®. The nuclei (fig. 1) are interconnected in an excitatory and inhibitory 
way, via glutamatergic, dopaminergic (DA) and gabaergic neurotransmitters. Us- 
ing the physiological information obtained about the Limbic circuit, we con- 
structed a model which describes the dynamics of the interaction between the 
nuclei involved. 

The model is conformed by a system of non linear differential equations of 
first order , obtained from a general balance equation for the density of excitatory 
and inhibitory synaptic activity in each nucleus of the circuit (see 3). 

The final system of differential equations obtained is the result of an asymp- 
totic analysis with respect to a small parameter which is involved with the rel- 
ative distribution of connections between the different nuclei of the circuit. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 179—186, 2001. 
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\ OlntnmRte ^ GABA AX AX Tonic 

^ ^ activity 

> Phasic, ^ > Basal activity Switch 

^ ^ dopamine • ^ subthalamic n. awitcn 

> ISpXme ^ mpuf ^ 

Fig. 1. The Limbic System with their nuclei and neurotransmitters involved 



In the differential equations that compose the system, there are certain func- 
tions which play a fundamental role, they correspond to the non linearity of 
the system and they are associated with the activation ability of each nucleus , 
this activation ability is on its time related with intrinsic characteristics of each 
nucleus and the kind of neurotransmitter that is liberated by its neurons. 

Different forms of the model allow us to simulate the situations corresponding 
to a prenatal lesion, healthy individual and recovered lesion and the chance of 
comparing the nuclei activity in each situation. 



2 Unknown Functions and Model’s Parameters 

We start defining the function whose dynamic will describe the interaction be- 
tween two given nuclei through the synaptic activity which exerts one nucleus 
over other. 

Given two nuclei fli, ilj we define : hfj{x,t) for x G f2j, like the excitatory 
(-I-) or inhibitory(-) density of synaptic activity in the location x of the nucleus 
fij at the instant t, produced by the connections from the nucleus 
We also consider the following functions: 
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X, v) :=The density of synaptic connections between the locations xf € 
l?iand X € fij from the circuit of neurons whose bodies are in l?iand through 
such connections travel action potentials with a specific density of propagation 
velocities v and which produce over the neurons from an excitatory(+)or an 
inhibitory (— ) effect. 

The functions keep macroscopic anatomical information from the circuit. 

gi(x' ,t) :=The fraction of the whole of neurons from the nucleus which 
shoot action potentials at the instant t. 

The functions gi describe the activation ability of each nucleus, so in conse- 
quence they keep functional information of the nuclei. 

We consider that the decreasing order of the connections (density of axons) 
between each pair of nuclei is of exponential type and that there exists only a 
velocity of propagation of the action potentials:!^. In this case we can represent: 

R%{x',x,v) = af^{x') Sj{x) <5(?; - V) (1) 

Where Sj{x) represents the density of synaptic contacts in the location x G 
ilj , (x) is a normalization constant which we will define after and <5 is the 

Dirac’s delta function. 

To the pairs (i, j) such that the nucleus Qi produces synaptic activity over 
the nucleus Qj, we will call an acceptable pair . We also write (-l-)(i, j) when the 
synaptic activity is excitatory and {—)(i,j) when it is inhibitory. 

In this case we have the following acceptable pairs: 



(+)(2,1) 


(+)(1,2) 


(+)(1,3) 


(+)(1,4) 


(-)(6,5) 


(-)(3,6) 


(+)(1,7) 


(+)(4,1) 


(+)(4,2) 


(+)(2,3) 






(+)(7,6) 




(+)(5,1) 




(-)(3,3) 














(+)(4,3) 











Note that in this list we did not included: 

-The tonic activity that exerts over 
-The basal activity that exerts over 
-The outer excitatory or inhibitory inputs over 

we also consider lateral inhibition inside the nucleus , it is the function 

3 Model of Interaction between Two Nuclei 

Considering that hfj{x,t) is expressed in (x', x, v) in sec{cm^)~'^ 

and gi{x',t) is dimensionless, then it is clear that 

RfJx' ,x,v) gi{x' ,t — — — —)dx'dv (2) 

J y 

represents the density of active synapsis in the location xG produced by the 
neurons located between the points x’ and x’-|-dx’ from Qi and by whose fibers 
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travel action potentials with velocities between v and v + dv which arrive from 
the location x at the instant t, so we obtain : 



hfj{x,t)=[ dv j Rf^{x\x,v) g^{x\t-'^-^^—-^)dx' (3) 

Jo J Oi ^ 

If we substitute the expression (1) in (3) we obtain, 




for every acceptable pair (+)(i,j) or (—)(i,j). 

It is clear that for every j fixed and x € fij it is accomplished that: 

pOO p pOO p 

/ dv Rfj{x' , x,v)dx' + / dv R~j{x' ,x,v)dx' = Sj{x) 

( 5 ) 

substituting the expression (1) in (5) and assuming afj like a constant, we arrive 
to the following ’’condition of normalization” for such coefficients, 




2rij 



( 6 ) 



where rij is the whole number of nuclei which are interacting with the nu- 
cleus j, rii = 3, ri 2 = ng = 2, ng = 4, ri 4 = ng = ri 7 = 1. 

Assuming that the density of glutamatergic and gabaergic fibers do not de- 
pend from the nuclei connected by such fibers, we can define 



Ai — AJ^ — Xi 2 — A42 — AJj — A43 — A]^4 — Ayg — A^^; 



A2 : — Agg — A'lifi — A 



36 



'33 



( 7 ) 

( 8 ) 



However, for the dopaminergic fibers we consider that A^g yf A ^3 and define: 



^3 A^3 

A4 := A^4 



Introducing the new unknown functions 



:= 



Sj{x) 



(9) 

( 10 ) 



( 11 ) 



and considering that such functions do not variate spatially inside each nu- 
cleus, so, after defining, 



Ai R3 A2 « A3 — A, e — 

M 



(12) 
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and making the variable change 

Xv t = T 

we finally obtain the simplified equations 



^ j_ 9 V 

dr2 






dr 



H^. = — 






for all the pairs (i,j) which appear in the equations (7), (8) and (9). 



d?Hf dHf, 

,2 y ,,2e ^3 



dr'^ 



dr 



= 



9Kr) + e^{r) 



for the pair (2, 1) from the expression (10). 



( 13 ) 



( 14 ) 



(15) 



4 Model for the Limbic System 

According to the physiological information about the activation ability of each 
nucleus, we can write, 

Gl{u\) + {l-ui)Gl{ul) 



9Gt) = Gi(ui) = 

9^{t) = G2(u 2) = Gl(u^) 

gO(r) = G3 (u 3) = ^3(^3) + Gliuj) + Gi(ttj) + (1 - ui)Gi{uj) 

g°(r) = G4 (u4) = Gliul) 
glir) = GM = GKO + GUul) 



ffeW = Gg{ug) = 



Gliul) + Gliul) 



(16) 

(17) 

(18) 

(19) 

( 20 ) 

( 21 ) 

(22) 



= Griur) = G^(u^) 

where the functions G* describe the percentage of neurons from the nucleus 
f2j which shoot action potentials like a result of the partial activation Uj of the 
nucleus defined by. 



1 ^^5+1+ ^ 2+1 

Ui - 



ul = H, 



Hi2 + H42 



41 



^3 — 2 



U3 = H43 
u\ = ~Hq5 



^76 -^36 



u\ = D COS^ (WT + 9 ) 
u\ = 

^,2 E^(t)-E_(t) 

“5 — 2 

ul = F cos 2 2 (or + </>) 



ul = H 



17 
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The functions M 3 and Uq , normalized with 0 < < 1, represents the 

spontaneous activity from the nucleus 1?2 over and from over Qj respec- 
tively. The expressions i?+, E- in m| represents the excitatory an the inhibitory 
inputs to the circuit through the nucleus Also, we can take for a healthy 
individual, 



m{u) 

n{u) 

p{u) 

q{u) 



Gliu) = Gliu) 

GUu) = G\{u) 
1 



1 

1 -|- 2 

G1(m) = GJ(m) 



Gl(u) 

Gliu) 



1 



l + e3-5(i- 

Gliu) = 



1 _|_ gl.5(i-«) 



Gtiu) = 



^ 3 ("“ 3 ) = D (1 — cos^(wT -I- 6)) = Dsen^iiJT + 9) 
E+ {t)-E- 



Gliul) = 
Gliul) = F 
If we define. 



2 

1 -I- cos^ 2 (or -I- if) 



= F cos^(ar -I- p) 






^(0) = 



dr 



-ij 



= ff.f + H±" - Ls»(0) 






also consider the system. 






dr 



1 



+ —9^r) = Mbe" 



then we obtain the following 



(23) 

(24) 

(25) 

(26) 

(27) 

(28) 

(29) 

(30) 



Theorem 1. The model of the Limbic circuit for a healthy individual is reduced 
to the system(42),(43). This system is equivalent to the original one in the sense 
that each solution of the original system (14),(15) satisfying the initial condi- 
tions is equal to the solution of the system (42), (43) satisfying the 

same initial condition Hff’ . 

•'J 
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Fig. 2. The Accumbens Nucleus activity for a normal individual (low) and for 
a pathological one (recovered lesion) (high) 



Making variations over certain coefficients in some functions Gj we obtain, 
from(42),(43) the systems which represents the cases of prenatal and recovered 
lesion. 

The system (42), (43) conformed by 14 equations can be reduced through 
changes of variables and an asymptotic analysis to an equivalent system of seven 
equations for small e and big t, from whose solutions we can write the activation 
variables of each nucleus. 

A numerical comparative analysis of the activation function of the different 
nuclei shows us changes comparative in the dynamics, specially an hyperactivity 
of the accumbens nucleus and an hypoactivity of the prefrontal cortex, according 
with the physiological hypothesis of the dopamine sensitization and the neurode- 
velopmental hypothesis for schizophrenia, (figs. 2,3) 

Figs. 2,3. In the horizontal scale, 2000 is equivalent to 100 msec. 
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Abstract. Krylov methods are, since their introduction in the 1980s, 
the most heavily used methods to solve the two problems 

Ax = h and Ax = Ax, x 7 ^ 0 

where the matrix A is very large. 

However, the understanding of their numerical behaviour is far from 
satisfactory. We propose a radically new viewpoint for this longstanding 
enigma, which shows mathematically that the Krylov-type method works 
best when it is most ill-conditioned. 



1 Introduction 

Krylov methods have been, since their introduction in the 80s by Y. Saad, widely 
used worldwide to solve large scale problems such as 

Ax = b and Ax = Ax, x 7 ^ 0 

which are the two basic problems associated with a large (often sparse) matrix A. 
Despite this widespread use, the understanding of their finite precision behaviour 
is far from satisfactory. The consequence is that even the best codes include 
heuristics and their convergence is not guaranteed. 

Such a state of affairs is intellectually frustrating and, until now, the Krylov 
methods continue to challenge the best experts. This paper presents the pro- 
gramme undertaken at Cerfacs in the Qualitative Computing Group since 1996, 
which looks at the Krylov methods in a completely new and original way [2] . 

2 The First Step: The Basic Arnoldi Algorithm for the 
Hessenberg Decomposition 

2.1 Irreducibility of if ? 

Classically, the Hessenberg decomposition 

A=VHV\ H Aessenherg,VV* = V*V = I (1) 

is considered under the assumption that H is irreducible. 

* CERFACS Technical Report TR/PA/00/40 

L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 187—197, 2001. 
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The main reason is that, from a mathematical standpoint, if H is not irre- 
ducible, it can be partitionned by means of two or more irreducible Hessenberg 
submatrices which are on the diagonal. The spectrum of A is the union of the cor- 
responding spectra. The reasoning is, of course, impeccable in exact arithmetic. 
But, as we shall see, it may be misleading for finite precision computation, where 
no 0 is exact on the subdiagonal of H, hence no exact partitionning can be done 
in practice. 

The explicit assumption that 

H is irreducible (2) 

leads to very strong assumptions on H, hence on A: 

a) if A is diagonalizable, then it should have simple eigenvalues, and 

b) if A is defective, it should be non derogatory (that is, it should not have 
more than 1 eigenvector per distinct eigenvalue). 

The assumption (2) is unrealistic because it artificially excludes matrices with 
multiple eigenvalues which are either diagonalizable or defective and deroga- 
tory. Moreover, when one is given a matrix which is exactly reducible, a simple 
backward analysis shows that A + A A will almost always fulfill (2). 

Imposing the condition (2) on H (or A) seems therefore unreasonable. It led 
to the widespread belief that the Arnoldi (or Lanczos) algorithm cannot compute 
multiple eigenvalues in finite precision. However, anyone can easily experience 
the contrary on a workstation. 



2.2 Happy Breakdown of the Algorithm 

We define H = (hij), Hk being the k x k upper left submatrix, and 
Vk = [vi,...,Vk] is an orthonormal basis for the Krylov subspace 

span{wi, Aui, ..., A^~^v\} for k = l,...n. 

Another seemingly pressing reason to assume irreducibility for H a priori 
is that, if hk+ik = 0 at step fc < n (in exact arithmetic), the mathematical 
algorithm stops: the vector Avk and the previous orthogonal vectors {vi,...,Vk} 
are linearly dependant. The basis for the Krylov subspace defined by A and v\ has 
dimension k and cannot be expanded. Mathematically, the vectors Ufc+i,..., 
are not defined. It is well known that the Arnoldi algorithm realizes recursively 
a QR factorization of the sequence of rectangular matrices of size n x (fc -|- 1) 





/1| 


\ 


Bk+i = [vi,AVk] = Vk+i 


Hk 






V hk+ik 


/ 



Vk+iRk+i 



(3) 



for fc = 1, ..., n — 1, where the triangular factor Rk+i is a triangle of size fc -|- 1. 
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For k = n, one has the particular formula 

Br,+ l=[vi,AVn]=Vn[ei,Hn] (4) 

The QR factorization can be implemented via i) a, Gram-Schmidt type algorithm 
or ii) the Householder algorithm. 

When hk+ik = 0, what happens algorithmically in exact arithmetic? The 
answer depends on the orthogonalization strategy: 

i) stop because of division by 0 (for classical or modified GS as well) 

ii) continue: division by 0 is avoided and a vector Vk+i orthogonal to vi,..., Vk 

is computed which initiates a new Krylov subspace. 

In finite precision, however, the computed value h^+i fe is yf 0 and all three above 
implementations involve a division by a small quantity, i.e. the computed value 
of hk+ik = \\Avk - hikViW (= 0 in exact arithmetic). 

The event h^+i fc = 0 is a singularity for the algorithm: it signals a rank 
deficiency in the Krylov basis for A initiated with v\ (dimension k < n). The 
algorithmic computation can be ill-conditioned in the neighborhood of the sin- 
gularity because of the possible division by a small quantity. It is interesting to 
remark that Householder copes with the singularity only in exact arithmetic. The 
interested reader is referred to [4] where the sensitivity of the Arnoldi algorithm 
to the starting vector v\ is studied by means of condition numbers for hk+i k 
and Vk+i- 

The singularity hk+i fe = 0 has another feature: in exact arithmetic, its oc- 
curence allows to obtain at step k an exact solution for Ax = b and a subset of k 
exact eigenelements for Ax = Xx, x yf 0. This is why the event hk+ik = 0 has 
been dubbed ’’happy breakdown” by software developpers [7] [8]. 

It is clear that the singular event hk+ik = 0 occurs for k < n. The usual 
assumption (2) forces the event to occur as late as possible, that is for k = n. 
However, in software practice, as we shall see in the next section, one wants the 
event to occur as soon as possible, for k as small as possible. This fact explains 
why the (often implicit) assumption (2) is most unfortunate, since it forbids 
the occurence of the most wanted event hk+i k = 0 for k very small with respect 
to n. Therefore, it fails to provide the appropriate conceptual framework to study 
the ’’convergence” of practical Krylov methods, which is the topic of the next 
section. 

Remark 1. The particular factorization of Bn+i = [vi,AVn], given in (4), shows 
that the final step k = n oi the Arnoldi algorithm can be interpreted as the 
singular event hn+in = 0. This is a consequence of the fact that A is a matrix 
of finite order n. If, more generally, we think of A as an operator in a func- 
tional space, then the above algorithmic process would continue endlessly, as 
long as hk+ik ^0 [!]• 

Gonsequently, in the matrix case, there is always, in exact arithmetic, at least 
one singular event for k = n, and maybe one or several additional ones for k < n. 
The main difference between these two kinds of singularities is that the first is 
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known to occur exactly for k = n, whereas for the second, one does not know in 
advance whether and when it may occur. 

This is this lack of information which accounts for the fact that the complete 
Arnoldi-Householder algorithm can be ill-conditioned if a singular event occurs 
before k = n (that is, if an early happy breakdown occurs). 

3 Iterative or Restarted Version of the Incomplete 
Arnoldi Algorithm 

For very large matrices, the Arnoldi algorithm is not allowed to run until com- 
pletion, that is until k = n, mainly for practical reasons of cost. A maximal 
size m n for the Krylov subspace is imposed, either fixed a priori or deter- 
mined dynamically. The exact information for k = n can therefore never become 
available. 

To compensate for that limitation, one uses the incomplete algorithm itera- 
tively. it is restarted for another set of m steps with a new starting vector ui, 
which is carefully computed from the information available from Hm at the 
previous iteration. The way the new is computed is with an ’’early happy 
breakdown” in mind. 

The idea is to enrich the starting vector v{ at each new iteration with 
information about the desired solution (solve Ax = b or Ax = \x) which has 
been computed during the (* — 1)^^ incomplete Arnoldi iteration, i = 1, 2, ... 

Example 1. Suppose that the r (simple) eigenvalues pj of A which lie in a given 
region of C are wanted. Suppose that vi belongs to the invariant subspace of A 
associated with the pj, then the Krylov subspace generated by A and hi has 
dimension r ^ n. Starting from any such hi, there is a happy breakdown for 
fc = r at most. In practice, one does not know such a hi. One aims at computing 
a sequence of starting vectors ri) , i = 1, 2, ..., which will progressively converge 
towards such a hi which contains exactly the information which is sought for. 

This examplifies the rationale behind restarted versions of the Arnoldi algo- 
rithm which are known under the generic name of Krylov methods. The conver- 
gence of towards hi as i increases is monitored by the backward error on A 
associated with approximate solutions computed from the current Hessenberg 
matrix. 

The previous example makes it clear that such backward errors can be small 
(with respect to machine precision) only if the is close enough to a vector 
hi. Therefore: 

Convergence of Krylov methods are best understood in the light of an 
early happy breakdown. 

One sees fully now why the assumption (2) of irreducibility of H in the decom- 
position (1) goes against the appropriate mathematical framework to analyse 
the convergence of Krylov methods. 



Understanding Krylov Methods in Finite Precision 



191 



4 Detection of the Singular Event hfc_|_ifc = 0, fc < n, in 
Finite Precision 

As already indicated, the singular event h^+i k = 0 cannot easily be detected in 
finite precision because the computed value hk+i k is non zero and may be too 
large (due to ill-conditioning) to be considered as zero. This is a serious difficulty 
which is one of the major keys to unlock the analysis of convergence of Krylov 
methods. 

We start our study by going back to the mathematical meaning of a singular 
event. 



4.1 A Krylov Basis of Dimension k < n 

The event hk+ik = 0 means that the vectors ui, ..., Vk and Avk are linearly 
dependant. Therefore, the matrix Rk+i in the factorization (3) is singular: in- 
deed hk+ik is its (fc -I- 1)^^ diagonal element. In order to quantify the distance 
to singularity of any computed Rj+l^ j = 2, ..., m, one can compute 



1 

cond2(i?j+i) 



amin{Rj+i) ^ dist(i?j+i, singularity) 



in the 2-norm. 

4.2 The Method Error in the Incomplete Arnold! Algorithm 

At step fc = l,...,n— 1, the following identity holds: 

AVk — ^kRk T hk+1 k'^k+l^k •> (^) 

or equivalently: 

(A - hk+ikVk+iv*k)Vk = VkHk- (6) 

Vk is an orthonormal basis for the perturbed matrix Ak ' = A — hk+i kEk, where 
the structure of the deviation Ek is of rank one: Ek = Vk+iv^. The k eigen- 
values of Elk are a subset of the spectrum of Ak'- The form of the identity (6) 
calls for an interpretation in terms of homotopic perturbations of the type 
tEk, t G C (see [6] [9]). It is easily shown that \hk+ik\, which is the (absolute) 
homotopic backward error, can be interpreted in terms of the (absolute) method 
error incurred when one wishes to represent A by its projected matrix Hk- As 
a by-product of this analysis, all k eigenvalues of Hk have the same relative 
homotopic backward error |hfc+i fc|/|| A|| 2 . 

Remark 2. If one uses the Householder orthogonalization, the scalar hk+i k is 
not guaranteed to be real nonnegative. 
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4.3 The Stopping Criterion in Software Practice 

The mathematical analysis has provided two possible quantities to detect a sin- 
gular event: 

i) Ofc = l/cond 2 (i?j+i) and 
ii) Pk = |/ife+ife|/||A||2. 

Software developpers, on the other hand, control the convergence by means of a 
third quantity based on the Arnold! residual, that we write here in the case of 
an eigenproblem Ax = Xx: 

Hi) 7fc = |/ife-Hife||?/fe|/||A||2||?/||2, 

where y can be any eigenvector of Hk and yk = ^k being the canonical 
vector in MP. 

It is clear that yt < Pk since 



To 7 fc , which represents the relative norwise backward error associated with the 
pair (/X, z = 14y), where Hy = yy in R^, we propose to add the relative norwise 
backward error associated with the scalar y only, that is 



which express the relative distance oi A — yl to singularity. Note that 5k < 7fc • 
We discuss the numerical behaviour of these four indicators in the next 
section. We shall see that it is useful to consider the variant 
7 fc ' = \\Az — yz|| 2 /|| A|| 2 || 2:||2 of 7 fc which is equal to jk hr exact arithmetic. 
However, once convergence has been reached, a numerical artefact takes place if 
the algorithm is let running: jk can spuriously decrease, whereas ■jk ' remains of 
the order of machine precision, as it should [3]. 

5 A Numerical Illustration 

We consider, as a numerical example, the matrix Rose [3] which is the companion 
matrix A of the polynomial 



The matrix is defective non derogatory with 3 multiple defective eigenvalues of 
multiplicity 3 equal to their index. 

The Jordan form of A = XJX~^ (with eigenvalues in the order 1, 2, 3, 4) is 
known and the starting vector v\ is chosen such that a singular even occurs for 
fc = 3. The starting vector v\ is of the form: 



k 



Ik = Pk and 

Wvh 




^v) Sk = l/\\Ah\\{A-yI)-%, 



p{x) = (x — l)^(x — 2)^(x — 3)^(x — 4). 
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It is clear that u belongs to the invariant subspace associated to 1 . Three values 
for p are selected: p = 0, 3 and 5, yielding 3 different starting vectors such that 
fc = 3. Three orthogonalization strategies are chosen to implement the Arnold! 
algorithm: 

1) classical Gram-Schmidt (CGS) (marked below with o) 

2) modified Gram-Schmidt (MGS) (marked below with -h) 

3) Householder (H) (marked below with x) 

In order to fully compare the respective behaviours of Arnold! in exact arithmetic 
and in finite precision in the presence of a singular event, the algorithm is run 
until completion, that is k = 10. 

We plot the curves k ^ ak, (3k, 'Ik, 'Ik' and Sk for the three implementations. 
See Figures 1 to 5. 

To analyse the accuracy obtained for the three computed eigenvalues pi which 
are close to 1, we look at the errors = maxj=i^ 2,3 \fJ-ik ~ 1|- 

The accuracy history is summarized by the plot k e^. See Figure 6. 

The following conclusions can be drawn from this numerical example. 

1) Detection of the early happy breakdown 

03 is not sensitive to p whereas /?3 is (10“^^, 10“^°, 10“®). See Figures 1 and 
2. 7 fc continues to decrease after machine precision has been reached (for k > 5), 
which is not the case for yj, '. This latter indicator yt ' is therefore more reliable. 
See Figures 3 and 4. 

2) Accuracy of computation. 

The backward error Sk on p reaches 10“^® at least for k > 5 (see Figure 5). 
The direct error Ck (on Figure 6) is of the order of 10“"^ which is in alignement 
with the holderian condition number of A = 1 as triple eigenvalue which is equal 
to 374 [5] 

374 X (10“^®)^/® ~ 4 X 10“'^. 

The variation of the results with p indicates that the value k = 3 for the singular 
event in exact arithmetic can be seen as fc = 1 in finite precision when p is large 
enough: this reflects the fact that v\ is then close to the eigenvector of 1. This fact 
is clearly seen on the Figures 7 and 8 which represent the plots p |/i 2 ,i|/|| A ||2 
and p — > |^ 4 , 3 |/|| A ||2 for p ranging from —15 to 15. 
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Fig. 1: Ofc 



Fig. 2: Pu 
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Fig. 3: 7fc 



p = 5 

Fig. 4: 7fc ' 
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Fig. 5: 5k 



Fig. 6: tk 
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Abstract. This paper shows that the residual sum of squares of Band- 
TAR models is a rational function of degree (4,2) of the threshold param- 
eter. Building on this result a novel fitting approach is proposed which 
permits a continuous threshold space and employs QR factorizations and 
Givens updating. Its efficiency gains over a standard grid search are il- 
lustrated by Monte Carlo analysis. 



1 Introduction 

Threshold autoregressive (TAR) models (Tong 1983) have been widely applied 
in recent years to capture the nonlinear behavior of economic and financial vari- 
ables. An m-regime TAR can be written as 

m 

Zt = + <Pizt-i + • ■ • + < Vt-d <8^) + St, (1) 

i=i 

for t = 1, . . . , N, where £* ~ iid(0, cr^), A(’) is the indicator function, —oo = 6^ < 
9^ < ... < 0™ = oo are threshold parameters, and pj and d are autoregressive 
(AR) and threshold lag order, respectively. This is a nonlinear model in time 
but piecewise linear in the threshold space. It partitions the one-dimensional 
Euclidean space into m regimes, each of which is defined by an AR model, 
depending on the values taken by the threshold or switching variable, Vt-d- 
A particular case of (1) is the following Band-TAR model 

Azt = A(t, 9) It{vt-d < —9) + B{t)It{\vt-d\ < + Ait, 9)~^ It(yt-d > 9) + St, 

( 2 ) 

* We are grateful to Erricos Kontoghiorghes and an anonymous referee for helpful 
comments. 
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with 



A{t, 9) = ai{zt-i + 0) + a2{zt-2 + 9) + ... + ap{zt~p + 9), 

A{t, 9)~^ = ai{zt-i — 9) + a2{zt-2 — 9) + ... + ap{zt~p — 9), 

B(t) = /3o + PlZt-l + P2Zt-2 + + (3qZt-q, 

where 0 > 0 is an identifying restriction. Under particular stationarity condi- 
tions, (2) characterizes a process Zt that converges to the boundaries of the 
inner band which act as attractors. More specifically, the process mean-reverts 
to 9 when Vt-d > 9 and to —9 for Vt-d < —9. It generalizes the Band-TAR 
with p = q = 1 introduced by Balke and Fomby (1997) which is self-exciting 
{vt-d = Zt-d ) and assumes a random- walk inner band (/3 q = /?i = 0). 

Band-TARs have been applied to capture asymmetries, limit cycles and jump 
phenomena in the behavior of financial and economic variables (Coakley and 
Fuertes 1997; Obstfeld and Taylor 1997). The fitting approach proposed in this 

paper can be easily adapted to a number of related specifications such as the 

continuous (C-) TAR model of Chan and Tsay (1998): 

Azt = (j)\{zt-i — 9)It + (j)\{zt-i — 0)(1 — It) + IjAzt-j + St, 

j ^ f 1 */ > 0 (3) 

* (0 otherwise 

where Vt-i = Zt-i — 9. Under the stationarity condition —2 < (</)(, </f^) < 0, the 
latter represents a process with differential adjustment towards the attractor 9 
depending on the sign of the past deviation. 

Our goal is to fit model (2) to the observed time series {zt}^^^ and 
Ordinary least squares (LS) or, equivalently, conditional maximum likelihood 
(ML) under Gaussian innovations, lead to the minimization of the following 
residual sum of squares (RSS) function 

n n 

RSS(</>) = ^(Z\zt - A{t, 9)-)^It{vt-d < -9) + Y.^Azt - B{t))^It{\vt-d\ < 9) 

t t 

n 

+ Y.{Azt-A{t,9) + )^It{vt-d>9) 

t 

with respect to ^ = (0, a', /?', d,p, q)' , where a = (oi, ..., Up)' and (3 = (/3q, f3q)' 

are the outer- and inner-band AR parameters, respectively, and n = N — 
max{d,p,q) the effective sample size. 

Let us assume initially that (d,p,q) are known. Our goal is to estimate 
{9,a\P'). The above RSS function is discontinuous in 0 implying that stan- 
dard gradient-based algorithms cannot be applied. If the threshold space 0 is 
small, a simple grid search (GS) can be effectively used to find the value 0 G 0 
that minimizes the RSS (or some LS-related criterion) or maximizes the log- 
likelihood function. The estimates of the parameters a and /3 can be computed 
by standard LS conditional on 0. 
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The threshold space is the continuous region 0 C However, in practice 
the GS is applied to a feasible (discrete) range in 0 by fixing a number of 
threshold candidates which are usually the sample percentiles (or order statistics) 
of vt-d, that is C(t) = < ■ • ■ < -y(i) < ■ • ■ < W(n)} C 0 where vi^ro) and 

are some bounds required to guarantee that each regime contains a minimum 
number of observations for the submodels to be estimable. However, since in 
principle any point in the continuous threshold space could maximize the log- 
likelihood, a full or detailed GS using < 9{ < W(i+i) : 0-^^ = 

9l + \,j = 1,2,...} U ^(t) C 0 where A is a step size, is preferable to a GS 
restricted to While a potential pitfall of the latter is that it may yield rather 
imprecise parameter estimates for small N , a practical problem with is that 
it may prove computationally very expensive for small step size A when the data 
are widely dispersed. This calls for an estimation method capable of handling 
a continuous threshold range while keeping costs within tractable limits. The 
numerical algorithm proposed in this paper is in this spirit. 

The organization of the paper is as follows. In §2 the Band- TAR is stated 
in arranged form to facilitate efficient estimation. The main results are given in 
§3. The proposed fitting approach is outlined in §4 and its efficiency gains are 
explored via small Monte Garlo simulation. A final section concludes. 

2 Arranged Autoregression 

For p = q = L, where L is a fixed upper bound, the observed data can 

be represented in AR form as y = f{X) + e, X = {xi, X 2 , . . . , x^)' , where y and 
Xj^ j = 1, . . . , L are n- vectors containing the sample observations of the variables 
Azt and Zt~j , respectively, e is a disturbance n- vector and n = N — max(d, L) 
is the effective sample size. This formulation can easily be transformed into a 
change-point (or Band-TAR) problem by rearranging its cases (rows) according 
to A yelding = /(X") L". 

Let 9 = 9k {9k > Q) he a plausible threshold value such that two in- 
dices, fci and ^2 (fci < ^ 2 ), are associated with it satisfying < —9k for 
i = l,2,...,fci, V(^i) > 9k for i = ^ 2 , /c 2 -k 1, . . . , n, and -9k < W(i) < 9k for 
i = fci -k 1, . . . , ^2 ~ 1- Using the above ordered-form notation the s = k 2 — ki — 1 
cases classified into the inner regime of (2) can be written as Azg = Z^f3 + Sg 



the r = n — {k 2 — ki — 1) cases in the outer regime can be written as Az^ = 



where: 
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Z^{6k)a + Sr where: 






( 


+ 0fc 


^12 


+ 0fc ■ 


■ '^IL 


+ 0k 




+ 0fc 




+ 0fe ■ 


■ 


+ Ok 




— 0fc 


^/C2 2 


-Ok . 


rpV 

■ 


— Ok 


V <1 


-0fc 


•^n2 


-Ok . 


rpV 

■ ‘^nL 


-Ok 


.vl)' 


and 


Sr = 


-- (£i,- 




r'" 



(5) 



Z\zr = {yl, . . . . . . ,ylY and = {e\, . . . . . . ,el)' . Note that 

the upper kixL and lower (n— ^ 2 + 1 ) x L partition matrices of Z^iOk) correspond 
to the A(t, and A{t,9k)~ outer AR schemes of (2), respectively. 

It follows that any new threshold, 6 6k, changes the entries of the outer- 

regime regressor matrix, ZY{0k). In addition, some specific thresholds change the 
size (number of rows) of ZY{0k) and also of the inner-regime regressor matrix Z^ 
via the addition/deletion of cases. These specific threshold values are the order 
statistics of Vt-d-, that is 0 G which determine a countable number of contin- 
uous nonoverlapping intervals [9i, 0i+i), where 9i and 9i+i denote consecutive or- 
der statistics. The latter define the threshold space, 0 = 0i+i)} C IR’*'. 

For 9 € [0i, 0i+i) matrix (5) can be rewritten as ZY(9) = ZOr + where: 

\ 



ZOr — 





X 12 ■ 


rp"^ 

. . 




^1^2 ■ 


• ■ '^kxL 


^k2l 


■ 


rpV 

• ■ ^k2L 


Uni 


rpV 

*^n2 ■ 


rpV 

• ■ *^nL 



( 6 ) 



and UY = Uru'g = — 1 , . . . , — 1 )'( 0 , . . . , 0 ) is a rank-one matrix with Ur 

an r- vector whose first fci components are all 1 and the remaining (n — k 2 + 1) 
components are all —1, and ug is an L- vector. Thus estimation of a entails a 
regressor matrix which depends on an unknown threshold parameter. 



3 Parameter Dependent Least Squares Problem 

Consider the linear regression model y = X{9)'y -|- e and associated LS problem 

min II X( 0 ) 7 - y II, (7) 

7 

where y and 7 are the n x 1 and m x 1 regressand and parameter vector, respec- 
tively, and X(9) is a full-column rank n x m (n > m) regressor matrix which 
depends explicitly on a parameter 0. The solution of (7) can be written in terms 
of the Moore-Penrose inverse of X (Bjorck 1996) as 

/ = X(0)^y. 



(8) 



202 Jerry Coakley et al. 



Analogously, the RSS can be expressed as 

\\eO\\l=y'{I-X{e)X{e)^)y, (9) 

where / is the n x n identity matrix. Since X(9) is full-column rank, then 
X{6yX{6) is nonsingular and (9) can be calculated by 

II e® ||2= y\I - X{e){X{9yx{e))-^X{9Y)y. (10) 

Suppose that X(6) is a polynomial matrix, that is, its entries are polynomials 
in 9. We are interested in the case X{9) = Xq + X\9, where Xq and Xi are 
constant matrices (independent of 9) and rank(Xi) = 1, to which (5) belongs. 
The following theorem applies to polynomial matrices. 

Theorem 1. Given an n x n polynomial matrix of degree r, A(9) = Aq + 
Ai9 . . . + Ar9^^ , where Ai, i = l,...,r are rank-one matrices, then detA(0) 
is a polynomial of degree r(r -|-l)/2 if n > r or nr — n{n — l)/2 if n < r. 

Following the proof of Theorem 1 in Coakley et aZ.([2000]), analogous results 
can be stated when some of the matrices Ai have rank different from one. We 
are interested in the case where r = 2 and the matrix Ai is obtained as Ai Ai 
with Al a rank-one matrix. Then det A{9) is a degree-four polynomial and the 
next formula follows directly from the proof of Theorem 1 



det A{0) = det Aq + det ^ 0 ( 2 ) + ^ 0 (^ 1 1 )) 



2=1 



2=1 



j.i=i 



+9^i 



'3^ ^ det 2 )) + E 
i,j—l 






0 ( 1 , 1 , 2 ) 



( 11 ) 



where indexes i,j and k in the same sum are always different. Letting a® denote 
the transpose of the s row of Ai then 



^n) ’ 



^O(l) “ (“l’ • ■ • ■ 

^ 0 ( 2 ) = 

AhJ _ 0 1 0 „0 „2 „0 

^0(1,2) “ Wl’ • ■ • ’ “i-li 



+ 1 5 ■ • ■ 5 — 1 ? + 1 ? • ■ • 5 / 5 



The matrices Af^fi and 2 ) are analogously defined. The following corol- 

lary particularizes equation (11) for the Band- TAR estimation problem. 

Corollary 1. If an n x n polynomial matrix of degree 2 is obtained as A{9) = 
{B C9)' (B C9) , where C is a rank-one matrix whose rows are vectors of the 
form (!,...,!) or (—1, . . . , —1) then det A(9) is a polynomial of degree 2. 

Proof. For this particular polynomial matrix A(9) the matrices Ai and A 2 are 



Ai = 



/ di -|- di di -\- d2 ■ . ■ di -\- dn \ 
c?2 + di £^2 T ^2 . . . £^2 -|- dr 



\dn + di dn + d2 ■ ■ ■ dn dn / 



1 1 



Ao = 



( 12 ) 
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where di = is +1 if the jth row of C is and — 1 

otherwise. The coefficient of 0^ in (11) vanishes since for the above matrices 

det = - det A^o’(\_2) ■ (13) 

To show this, it suffices to notice that 

det 2 ) = det(a°, . . . , dil + d, . . . , nl, . . . , a°)' 

= ndi det(a°, . . . , 1, . . . , 1, . . . , a° )' + ndet(a'j', . . . , d, . . . , 1, . . . , a° )', 

where d and 1 denote the vectors (di, . . . , dn)' and (1, . . . , 1)', respectively. Anal- 
ogously, the coefficient of 9'^ is proven to be zero. □ 

We now state the main result of the paper which is a direct consequence of 
(10) and Corollary 1. 

Theorem 2. If the nxm (n > m) regressor matrix X{9) in (7) is a polynomial 
matrix of degree 1, X{9) = Xq + X\9, with X\ of rank one and whose rows are 
(!,...,!) or (—1, . . . ,— 1), then the residual sum of squares || e® \\\ is a rational 
function of degree (4>2) provided X(9) is a full- column rank matrix. 

Next section outlines an estimation approach for Band-TARs which builds on 
these results. A more detailed description can be found in Coakley et al. (2000). 

4 The Fitting Algorithm and Simnlation Analysis 

For each plausible threshold lag d G {1, 2, ..., D} the algorithm iterates as follows. 
For the outer regime: 

1. Calculate the QR factorization of {Z0“\Azr) for the initial threshold interval 

[01-1 - 1 , ) C 0, where 0 ti-i and 9^ represent (extreme) consecutive order 

statistics of the observed variable. 

2. For each threshold interval [0^, 0i-|_i) repeat for different p G {1,2, ..., L}: 

(a) Generate seven (for instance, equally spaced) values for the threshold in 
the current interval, 0j j = 1, . . . , 7. 

(b) Calculate the R factor of the matrix {Zft {9{)\Azr) for j = 1,...,7, 
by means of a rank-one correction update of the decomposition of 
(ZOfjAzr). Use these Rj factors to calculate the RSSr{9i). 

(c) Via rational interpolation on the points {9l,RSSr{9l) identify the 
RSS{9) function associated with the current interval. 

(d) Minimize RSS{9) over the current interval to obtain a (locally) optimal 
threshold, 0*, compute the Akaike Information Criterion value at 0*, 
A/C*, and move to the next interval. 
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Table 1. Monte Carlo simulation results 



FGS approach 


DGP 






t^(mins.) 


1 RMSE 


Br 


G(mins 


.) MAD 


I 


.2 


.00049 .00077 


.1893 


.02210 


-.00021 


.1873 


.01080 


I 


.4 


.00108 .00109 


.4703 


.03278 


-.00105 


.4624 


.01520 


I 


.9 


.28120 .25780 


.6059 


.58919 


.05593 


.6008 


.08144 


II 


.2 


.03915 -.12880 


.3106 


.23592 


-.04238 


.3088 


.05681 


II 


.4 


.03030 -.03651 


.3685 


.19472 


-.00064 


.3688 


.04549 


II 


.9 


.04603 -.00773 


.4787 


.21448 


-.01143 


.4767 


.05722 


III 


.2 


.05104 .10290 


.5210 


.24805 


.06048 


.5208 


.06048 


III 


.4 


.30790 .31850 


.6430 


.63937 


.13290 


.6326 


.13290 


III 


.9 


1.1917 .98270 


.8470 


1.4680 


.31150 


.8465 


.31530 


RF approach 


I 


.2 


.00038 -.00028 


.2425 


.01949 


.00013 


.2335 


.01023 


I 


.4 


.00108 .00060 


.2332 


.02940 


-.00105 


.2341 


.01440 


I 


.9 


.26110 .24211 


.2319 


.56500 


.04996 


.2325 


.07308 


II 


.2 


.03862 -.12870 


.2351 


.23480 


-.04672 


.2365 


.06494 


II 


.4 


.03635 -.04014 


.2357 


.17770 


-.00126 


.2368 


.04246 


II 


.9 


.04400 .00578 


.2359 


.20891 


.00623 


.2370 


.05006 


III 


.2 


.03246 .09917 


.2658 


.20280 


.06753 


.2666 


.06753 


III 


.4 


.21313 .30622 


.2659 


.54164 


.13130 


.2667 


.13130 


III 


.9 


1.1013 .91536 


.2664 


1.3917 


.28592 


.2670 


.28601 



The inner regime iterations are simpler since (4) does not depend on 9. 
The best fit Band-TAR parameters are those which minimize an overall Akaike, 
AIC^ + AICp, over all threshold intervals in 0. 

The above algorithm and subsequent Monte Carlo experiments are pro- 
grammed in GAUSS 3.26, a high-level matrix programming language with built- 
in statistical and econometric functions, and run in a 500MHz Pentium III. Three 
data generating processes (DGPs) are used to create A"q-I- 100 observations, where 
the initial A'o = 200 observations are discarded to minimize the effects of initial- 
ization (set at zero). The DGPs used are the following particularizations of the 
self-exciting {vt-d = zt-d) Band-TAR model (2): 

I) q = 2,p = 2, d = 1, 0 = 0.35, P' = {0.5, -0.55, -0.75} and a' = {-0.8, -0.75} 

II) g = l,p = 3, d = 2, 0 = 0.92, P' = {0.4, -1.0} and a' = {-0.5, -0.73, -0.35} 

III) g = 3,p = 5,d = 1,0 = 0.18, /3' = {-0.95,-1.65,0.8,0.45} and a' = 

{-1.8, 0.35, 0.4, -0.6, -0.75} 

Three different error terms tt ~ iid A"(0,CTg), cTg = {0.2, 0.4, 0.9} are em- 
ployed which combined with the above DGPs imply 9 different specifications. 
To focus on the ceteris paribus effect of the continuous RSS rational function 
component of our fitting approach (RF hereafter) we take as benchmark a fast 
GS method (FGS) with A = .001, which also uses QR factorizations and Givens 
updates. The summary statistics used in the analysis are the sample variance 
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(ct|), mean bias root mean squared error (RMSE), median of bias (Br), 

mean absolute deviation (MAD), mean computation time (t^) and median of 
computation time (tr)- Table 1 reports the results based on M = 500 replica- 
tions. A comparison of bias measures across methods reveals that, despite the 
small A used, FGS generally yields threshold estimates more biased than those 
from RF. For instance, the RMSE and MAD from RF are smaller than those 
from FGS in 9 and 7 cases, respectively, out of the 9 specifications explored. 

The discontinuity imposed by the Heaviside function requires solving a num- 
ber of LS problems sequentially to identify and estimate the Band- TAR model. 
While computation costs may not be an issue in ad hoc TAR fitting to a sin- 
gle time series, these are germane in inference analysis using simulation tech- 
niques. The growing evidence of nonlinear behaviour and in particular of regime- 
switching dynamics in economic time series has fostered the development of new 
tests — which can be viewed as extensions of existing linear tests — in a TAR 
framework. Exploring the small sample properties of these tests by Monte Garlo 
or bootstrap methods and/or estimating response surfaces with a sensible num- 
ber of replications can quickly become intractable if the computation costs of 
TAR fitting are disregarded. 

The latter underlines the importance of using efficient numerical tools in 
TAR fitting such as the QR approach and Givens rotations. Table 1 provides 
prima facie evidence of how these tools speed up Band- TAR fitting (Goakley et 
al. 2000). More interestingly perhaps, while the computation costs of the FGS 
method — as measured by and G — increase with the innovation volatility 
(noise), these are invariant to the latter in the RF method and depend only on 
the sample size. This difference is likely to be relevant when fitting Band-TARs 
to highly volatile data such as those associated with financial variables. 

5 Conclusions 

This paper shows that the RSS of Band- TAR models is a continuous rational 
function of the threshold. Using this result we propose a novel fitting approach, 
which allows for a continuous range for the threshold while keeping computa- 
tion costs within tractable limits. It uses standard minimization techniques and 
employs QR factorizations and Givens updates. Its efficiency gains over a fast 
grid search are illustrated via Monte Garlo experiments. As computation time 
is highly dependent on the rational interpolation algorithm used, we leave im- 
provement of the latter for future research. 
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Abstract. A model with the brain and the other shell of the head like 
conductors medium with differents conductivities has been used for to 
estudy the inverse electroencephalographic problem. Technics of the po- 
tential theory has been used for to transform the model in a operational 
problem which under some conditions gives the uniqueness of recupera- 
tion of cortical neurons aggregate (sources) in the cerebral cortex from 
measurement of the potential in the scalp. 



1 Introduccion 

The electroencephalography method is the more famous between the nondes- 
tructive methods of investigation of the brain and is based in the record of 
its electric activity. The scalp EEC is a valuable clinical tool. Furthermore, the 
evoked potential measured on the scalp shows promise in the diagnosis and treat- 
ment of central nervous system diseases ([10] pp. 5). The potential field produced 
by this electric activity open great posibilities of investigation ([10], [13]) that in- 
duce statement of inverse electroencephalographic problems (lEP) ([9], [10], [11]). 
Different statement of lEP can be consult in ([7], [10], [13]). 

So the lEP consist, in outline, in to determinate, from measurement of poten- 
cial on the scalp of sources in the cerebral cortex. The lEP, lies in the collection 
of problems called ill posed. 

2 Model of Conducting Medium for the 
Electroencephalographic Activity 

We suppose, that the human head, considered as conductor medium, is divided 
in five disjoint zones as shown the figure 1, namely: 

1. — Brain 2. 172 — Muscles 5. 175 — Scalp 

3. 173 ~ Intracraneal liquid 4. 174 ~ Skull 
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Fig. 1. Head is divided by shells with different conductivities 



In the following, we will suppose that we have a conductor medium 1? = 
U^=i shown in the figure 1, where each component has a constant 

conductivity Ui, besides at ^ aj for i yf j. 

We have denoted by Si the surfaces which compose the boundary of the 
f2i regions: df2i = U S'!; 9l?2 = -S'o U S' 2 ; = Si U S 2 'J S 3 ] 9l?4 = 

S 3 U S 4 ] dfl^ = 5'4 U 5’s and by: [2q = [2 = R^\^2, fl = HXS^. 

We suppose that the current in the 17 region are produced only for the electric 
activity of the brain. Such current are: Ohmics and impressed. 

The moving of charged ions through of the extracelular fluid and the diffusion 
current through of the neuronals membrane produced the Ohmics current and 
the impressed current, respectively. 

We will denote by J, the volumetric density of impressed current in 17i and 
by j, the superficial density of impressed current in Si (cerebral cortex). 

So, the volumetric density current in the region l7i ocupate for the brain is 
([9], pp. 88): = J + (TiEi where E is the electric field generated, and for the 

Ohm’s law, aiE denote the density of Ohmics currents. In the others regions we 
will consider ohmics current only. 



It is possible to neglect the term 

at 



the continuity equation ([6]): V • 



d Pi 

J 2 p + = 0 where JX. and pi 

at 



denote, the 



density of current and charge, in 



every region 17^ i = 1, .., 5, respectively. 
In this way we obtain: 



V • (J + CTiFli) = 0 en I7i (I) 

V • (ajE) = 0 en fij j = 2, .., 5. (2) 

We can consider that the magnetic field B generated by the electric activity of 
the brain satisface that ^ = 0 ( [13], pp. 206). Therefore, exist a electrostatic 
potential u such that E = Vu. The potential u satisface the following equation: 



Au = -—\7-J (f7i) 



( 3 ) 



Au = 0 (17i); 



i = 2,...,5. 



( 4 ) 
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We introduce the following notation: Ui = u\q. , i = no is nor- 

mal unitary vector outside to in Sq] rii is normal unitary vector outside 
to [2i en Si] i = f{x) = -^V • J(x), x S f?i; ip(x) = -{j ■ 

rii){x), X G Si- 

The boundary conditions are the continuity of the potentials and the conti- 
nuity of the normal component of the current on the surfaces Sj (j = 0, . . . , 5) 
which separate the fii regions ([14]). Such conditions take the form: 









dui 


du2 


{So) 








ono 


= 

ono 






(So) 


dui 


_ du3 


+ if (Si) 


Ui 


= U2 


dni 


dni 


Ui 


= U3 


(Si) 


du2 


_ du3 


(^2) 








dn2 


dn2 


U2 


= U3 


{S2)] 


du3 


du4 


(^3) 


U3 


= U4 


{S3) 


OU3 


= 

OU3 


U4 


= U5 


(Sa) 


du4 


_ du3 


(^4) 


^'^du4 


du4 



du5 

dri5 



= 0 (5s) 



( 5 ) 



dui 

respectively, where — — denote the normal derivative of Ui in Sj with respect 
auj 



to rij. 

In the following we will study the boundary problem (3)-(5). We will call to 
this problem Electroencephalographic Boundary Problem (EBP). 



3 Application of the Potential Theory Methods for to 
Obtain a Weak Solution of the EBP with / = 0 

For to resolve the problem of to find a harmonic function it in f? with boundary 
conditions of Dirichlet g, or Neumann h, we used techniques of the potential theo- 
ry. The solution is search like a potential of doble or single layer, respectivily. 
This conditions g and h correspond to measurement of potential or current on 
the boundary S' of 12, respectivily. ^ 

If / and g are continuous functions on the boundary of 17, that is to say, 
f,g G C{S), the problems are transform it to find a density of charge p for 
the boundary conditions of Neumann and a density of dipolars moments p for 
the Dirichlet’sconditions which satisface operational equations of Fredholm of 
second kind ([8]). 

Henceforth, we suppose that we have not volumetries sources, since in this 
case the inverse problem has not a unique solution. 
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We will not considered, in general, that the EEG measure on scalp arise 
of a potential which is distributed continuously on itself because if this case 
occurs, the density of currents that produce such measurement will be distributed 
in a uniform way. But this fact in general no occurs, because the current is 
concentrated in the “active zone”, which can to be distributed in a irregular 
way. For this reason, we will suppose that the boundary conditions in the EBP 
belong to L 2 {Si). For to applied methods of the potential theory for boundary 
conditions in L 2 we need Sojovtsky’s formulas in L 2 , analogue to the Sojovtsky’s 
formulas for the continuous case ([ 8 ] , pp. 88 ). 

We considered the single layer potential V(x) = ^ f Of the next 

s 

result ([12], Cap 1, §4): if G if a measurable and bounded subset 0 / R™ and K 
is a integral weakly singular operator on G with kernel 0^^, 0 < A < m where 
Xp > m, 1 + 2,- = 1, then the operator K : Lp(G) Lq{G) is compact for 
each q such that 1 < 9 < obtain that if S' is a Liapunov’s 

surface of clase (7^’“ then the principal values Vq y (^)q of the potencial of 
single layer and its normal derivative can be extended from G(S) to compacts 
operators in L 2 {S). For the last result, the Sojovtsky’s formulaes for the normal 
derivatives of a single layer, for boundary conditions in L 2 take the form: 



dV 


fdV 


dui 


\dn 


dV 


(dV 


dUe 


\ dn 



-k’ 


(6) 




(7) 



where n represent the outside normal to the surface S and, represent 

the limits values of the normal derivaties interior and outside of the potential of 
single layer, respectively. 



Definicion 1 (Solubility of the EBP with f = 0). Given a vector (p G L 2 (Si) 
such that f (fdsi = 0, we say that EBP with f = 0 is soluble, if exist a sucesion 
Si 

5 

of classics solutions : v„ G 0 {G^(f2,)nc^ (f2,)}nG(77) , n G N of the 

i—1 

problem Avn{x) = 0 , x G G where Vn satisface the boundary conditions (5) 
with ifn € G{Si) instead of p y Pndsi = 0 , <Pn ^ Pi in ^ 2 (^ 1 ). Under 
this conditions, if exist the limit in ^ 2 ( 12 ) of the sucesion Vn, is called weak 
solution of EBP with / = 0 . 



We begin with the suposition that the boundary condition of the EBP with 
/ = 0 is a continuous function on and we will search its clasic solution 
like a sum of potential of single layer with respect to continuous densities pi 
on Si, i = 0, . . . , 5, that is to say. 



R{p){x) = u(x) 



1 

4tt 




Pi(yi) 
X - m\ 



dsi 



(8) 
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where S'! = 5'o U S'!, S 2 = SqLI 82 , Sj = Sj j = 3,4,5, Pj{x) = Pj{x); x € 
Sj; j = 3,4,5. 

~ ( Pi(x) ;xGSi . ~ _ f P 2 (x) \ x G 82 

~ 1 Po{x) ',xG8q ’ ^2 - I ■,xG8q 

and p = {po{xQ),pi{xi),p 2 {x 2 ),P 3 {xs),pA{x 4 ),Ph{x^)Y' ■ 

For any choice of densities pi the function u{x) defined in ( 8 ) satisface the 
Laplace equation: Au{x) = 0, x G 12. If we seek that u{x) comply with the 
boundary conditions for the continuity of the normal component of the current 
gives in (5), of the classics equation of Sojovtsky, we obtain a systems of integral 
Fredholm equations of second kind for to determinate the densities pi ([3] y [4]), 
which can be writed in the matricial way: 



{K + I)p=J (9) 



where J = (0, </?, 0, 0, 0, 0), and K = (iFy) is the matrix which compo- 

+ (73 

nent are the integral operators: 



Km = 

Km = 
2^10 = 



4(ji-er3) / _ 

rri -\-rrn \ H 



(cTi-o-2) 

cri+cr2 



(o-l-cr2) 

0'l+0’2 

2(g-i-o-3) 

CTl+CTs 




2^13 

K20 

K23 

K30 

K 33 

Kio 

Ki3 

K 50 

K 33 



2(cti-o-3) 

cri+<T3 

4(g'2-o'3) 

2(g'2-g'3) 

4(ct3-0-4) 

Cr3+<T4 

2(ct3-0-4) 

CT3+cr4 

4(g-4-o-5) 

(T4+{T5 

2(ct4-0-5) 

(T4+{T5 



(dV,) 

\d.n2j^Q 

( aa'] 

\dn 2 jQ 

V<i"3/g Q 

( aa'] 

V d"3 y Q 

V‘^"Wo.o 

\dn^J^ 



Ku 

K 21 

K 24 

K31 

K34 

K 41 

K 44 

K 31 

K 34 



2(cti-o-3) / dVi \ 

0-1 +0-3 ydni J 

2(g'2-o'3) f dFLA 

0'2+0'3 dn2 J 

2(<72-o- 3) ( dVi \ 
0-2 +0-3 \dn2 ) 
2(ct3-0-4) / 

0-3 +0-4 \^<ira3 ) 

2(ct3-o-4) / dVi \ 

0'3+0'4 \ dU 3 j 

2(g-4-o-5) / dVi \ 

<r4+cr5 dri4 J 

2(ct4-o- 5) / dV4 \ 

<r4+cr5 drii J 




0.1 



0,1 

0 

0.1 

0 

0,1 

0 

0.1 

0 



Kq2 

Kq5 

Ki2 

Ki5 

K 22 

K 25 

K32 

K35 

Ki2 

K 45 

K52 

K 35 



{Ul-(J2) 

cri+cr2 



(o-l-cr2) 
cri+cr2 
2(cti-o-3) 
0 - 1 + 0-3 

2(cti-o-3) 
0 - 1 + 0-3 

2(g'2-o'3) 

(72G<73 

2 (g' 2 -g' 3 ) 

(T2+cr3 
2(ct3-0-4) 
0 - 3 +0-4 

2(ct3-0-4) 
CT3+cr4 
2 (g- 4 -o- 5 ) 
0-4 +0-5 

2(ct4-0-5) 

<74+0-5 

\ dn^ J 

Hs) 




0 



212 



Andres Fraguela Collar et al. 



where we have introduced the notation: 



with 





/ 

47T 


^■PIso 


dn^ 


f 1 

( \xo- 


yol) Po(yo)dso+ 


fsk Sng ( 


Pk{yk)dso 


i = 0 


J_ 

47T 


^■PJsk 


dnl 


1 

\xk- 


Pk{yk)dsk+ 


Is 


— f 

9 K V 


^o(yo)dso] 


i = k 


( dVk' 
\dno ^ 


)o,o + 


^dVk'^ 
^ dno y 


0 ,k ’ 


k = 1, 


2. 


( dVk 
\dnk 


)o.fe + 


/ dVk " 

\dnk ^ 


; 

' 0,0 


k = 1, 


2. 


( dVk 
\dfH^ 


)o,.+ 


( dVk'' 

1 drii j 


; 

0,0 


k ^ 





The matricial operator K is compact in L2{Sq) x • • • x iv2(<S'5) because the op- 
erators Kij : L2{Sj) L2{Si) are compacts. 

The proof of the following theorem can to be consulted in [3] and [4]. 



Teorema 1 The weak solution of the EBP with / = 0 exist and is unique for 
any boundary condition C L 2 {S\) such that (pdsi = 0. Furthermore, this 
solution not depend of the choice of the sucesion (pn C C{Si) if satisface the 
conditions Pndsi = 0 y \\ipn ~ t\\l 2 {Si) 0, n ^ oo. 

Observaciones: Any eigenvector p od K for the eigenvalue A = — 1 , comply 
that V{p){x) es constant in all space since in this case V{p){x) is solution 
of the homogeneous EBP. For this reason, if we define the singular operator 
R : L2{Sq) X • • • X L2{S^) L2{S5) by R{p){x^) = V{p)\g^ , is easy to see 

that exist a unique eigenvector = (pQ(a:o), p?(xi), • • • , ^5(0:5)) of K, for the 
eigenvalue A = — 1 such that 



R{p°){x 5 ) = 1; X5 G S5. 



( 10 ) 



If we resolve the equation (10), for to obtain the eigenvector p°, the weak solution 
of the lEP is reduced to study the system: 



{K + I)p=J (11) 

Rp=V (12) 

where P is a function given of T2(<S'5) which correspond to the EEG generate by 
the electric activity in the cerebral cortex and measured on the scalp (S'5). 

The next theorem garantize the uniqueness of recuperation of p in the cere- 
bral cortex of the EEG measured in the scalp V. 
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Teorema 2 Given a measurement V on the scalp, exist a unique p on the cere- 
bral cortex which produce such measurement. Consequently, the injectivity of the 
operator R is proved and therefore the uniqueness of recuperation of p from V. 

Proof: Let (f such that V = 0 in S' 5 . R{p) is a sum of potentials of single 
layer and , therefore, is harmonic in Gq. Furthermore like V = 0 en S'5, by the 
uniqueness of the exterior Dirichlet problem, we have that R{p) = 0 in R^\n. 
Because {K + J)p = J we have that R{p) = 0 en 125. The junction of this 
results say us that R{p) = 0 en Appling the formulas (6) and (7) we 

deduce that ps = 0. With a analogous analysis for the others densities we find 
that pfc = 0. So, we conclude that J = 0 and therefore that, p = 0. A However, 
the inverse operator of R not is continuous and therefore, is required to apply 
algorithms of regularization for to obtain in a stable way the normal component 
of the current in the cerebral cortex. 
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Abstract. SETNL is a set of subroutines written in C-|— I- that enables 
to manipulate nonlinear programming problems in different ways. Solu- 
tion procedures which are usually implemented at the level of the opti- 
mization modeling languages can thus be moved into the algorithm. An 
example is presented where two embedded solution methods, one based 
one the economic theory (Negishi) and the other one on the operations 
research theory (Dantzig- Wolfe decomposition), are used to solve a dy- 
namic general equilibrium model. 



1 Introduction 

Economic models are often formulated in Algebraic Modeling Languages (AMLs) 
because within this framework models can be described in a high level language 
which is very close to the mathematical standard notation. As a result, formu- 
lations in AMLs are brief and readable thanks to the use of indexes involved 
in parameters, variables and constraints definition. Once formulated in AMLs, 
models can be processed by different solvers without having to be modified. In 
this sense, an AML is a black-box, users being freed of implementation considera- 
tions as far as the solution process is concerned. However, large-scale or complex 
nonlinear programming models may need customized solution techniques, such 
as decomposition procedures, which cannot be easily implemented within this 
framework. To this end, we have developed a library of C-I-+ routines, called 
SETNL (see http://ecolu-info.unige.ch/logilab/setnl), which can extract partic- 
ular block structures of nonlinear programs and manipulate them in different 
ways during the solution process. SETNL enhances AMLs standard capability 
of processing nonlinear programs (NLP). This is performed via a flexible object- 
oriented approach which allows to implement a variety of solution techniques 
such as nested decomposition algorithms. The key point is that problem formu- 
lations have not to be changed, contrary to current practice which relies on the 
use of AMLs procedural statements such as if-then-else or loop integrated 
into models. SETNL allows to move such procedural statements contained in 
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models into the solution algorithm itself. This removal of procedural statements 
makes models more readable and is expected to improve the solution process 
efficiency. 

We present the SETNL capabilities through a model having a structure which 
can be exploited by specialized algorithms. The reference case is a demonstration 
version of MERGE, a dynamic General Equilibrium Model (GEM) developed by 
Mamie and Richels [5] . It is a model for evaluating the regional and global effects 
of GreenHouse Gases (GHG) reduction policies. It quantifies alternative ways of 
thinking about climate change. The model may explore views on a wide range 
of contentious issues: costs of abatement, damages of climate change, valuation 
and discounting. This GEM is formulated in GAMS as a welfare-optimization 
problem, following Negishi [6]. Negishi weights are unknown when solution starts 
and are iteratively determined. Without using SETNL, this is performed by a 
loop defined in the GAMS model itself. However, the use of SETNL first enables 
to implement this loop outside the modeling environment, second to exploit the 
special structure involved in the model optimized at each iteration. 

Indeed, each NLP that is solved inside each Negishi iteration displays a primal 
block angular structure which makes it a good candidate for the Dantzig- Wolfe 
decomposition [3] . The principle of decomposition is by now a well known idea in 
mathematical programming. In the early 1960’s, decomposition techniques were 
proposed as a promising approach in addressing the limitations of computers 
in solving large-scale mathematical models. However, research and experiments 
were limited to linear and integer programming. By the 1990’s, more attention 
was focused on nonlinear programs as computational power was making large- 
scale nonlinear programming possible. Recent results have shown that NLP de- 
composition methods can help not only to solve untractably large problems, 
but to drastically improve solution times and in some cases, remove numerical 
instabilities [1]. With the added potential benefits of parallel implementation, 
decomposition methods can provide an important tool for the economic modeler. 
The general idea of decomposition is to break down the original mathematical 
programming problem into smaller, more manageable problems. These subprob- 
lems can then be solved and iteratively adjusted so that their solutions can be 
patched together to derive the solution to the original problem. 

As a result, SETNL is able to compute general equilibria via a Dantzig- 
Wolfe decomposition embedded within a Negishi loop, without expecting mod- 
elers complex reformulations of the original welfare optimum problem. Such a 
solution technique is designed to generally improve computational efficiency and 
even to enable their processing in case of very large models. This application also 
shows that two solution procedures, one based on the economic theory (Negishi) 
and the other one based on operations research theory (Dantzig- Wolfe) can be 
both handled at the algorithmic level. 

The paper is organized as follows. Section 2 briefly presents how AMLs pro- 
cess mathematical programs and early experiments that precluded the current 
development. The concept SETNL is explained in section 3. The effectiveness 
of this approach is illustrated by an example of routines which are useful in a 
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decomposition framework. The way SETNL is included in-between the optimiza- 
tion modeling language and the nonlinear solver is also detailed. In Section 4 
we describe a case where a general equilibrium model is solved through SETNL. 
The algorithm developed here involves two solution procedures: Negishi and De- 
composition. In Section 5, we give some concluding remarks. 

2 Early Experiments with the GAMS Modeling Language 

One of the most important aspects of AMLs is their capability of dealing with 
nonlinear programs (NLP). In the case of linear programs, all the data of a 
given problem is transferred in one move from an AML to a solver before the 
solution process starts. Nonlinear models involve a more complicated process. 
In this case, any solution algorithm generates, iterates, and needs to get up- 
date on information such as values, Jacobian and possibly Hessian of nonlinear 
functions. This role is assigned to the built-in nonlinear interpretor of AMLs 
since solvers have no algebraic knowledge of the model. Figure 1 describes the 
exchange of information occuring between an AML and a solver. In the case 



/(*,), V/(xfc),VV(xfc) 




of large scale or complex non-linear models which necessitate customized solu- 
tions procedures, this transparent scheme can be a drawback. For example, an 
automatic decomposition is difficult due to the fact AMLs do not produce par- 
tial information in a structured way. Early experiments to retrieve the structure 
from nonlinear programming problems were realized by Chang and Fragniere [2] . 
A prototype, under the form of procedures called SPLITDAT and DECOMP, 
although very simple, enabled to solve the Ramsey model through a Benders 
decomposition (same idea as Dantzig- Wolfe however adapted to the dual block 
angular structure, see section 4). In such a situation some functions of the math- 
ematical program need to be split into several separable pieces. Using an AML, 
it is necessary to call the nonlinear interpretor to solely handle parts of the 



Exploiting Nonlinear Structures 217 



entire problem. This means some customization of the nonlinear interpreter to 
avoid generating the full nonlinear information associated with the entire prob- 
lem. Modeling languages such as GAMS or AMPL provides an I/O library. It 
is a set of functions allowing one to extract due course information needed to 
solve a given optimization problem formulated with AMLs: values, gradients, 
and Hessians of objective and constraint functions. The aim of the additional 
I/O subroutines SPLIDAT and DECOMP was to provide the GAMS user with 
the possibility of using Benders decomposition algorithms within the GAMS 
modeling language framework. SPLITDAT takes the original model and splits 
the data into a master problem and one or more subproblems. DECOMP uses 
the decomposition algorithm to determine the optimal solution of the original 
problem. 

Experiments had been performed with an intertemporal aggregate growth 
model stemming from work done by Ramsey [7]. This model involves three 
decision variables in each time period t: Consumption, Investment, and Cap- 
ital Stock {Ct, It, Kt). Consider an economy with a single agent acting as pro- 
ducer, consumer, investor, and saver. Given initial levels of our decision variables 
(Co, /qj Kq), and a Cobb-Douglas production function that is a function of cap- 
ital and labor (f{Kt,Lt) = Ax and given exogenous labor supplies, 

Lj, we want to find an optimal level of Consumption, Investment, and Capital 
Stock. To find this optimal level, we maximize a discounted logarithmic utility 
function under a capital stock constraint and a production constraint. With a 
fixed growth rate, g, and a utility discount factor, udf, the model is written as: 

max '^f{udfY x log{Ct) t gT time periods 
s.t. Kt+i = Kt + It 

at X iff = Ct + It where at = A x 

since labor, Lt,has a fixed growth rate ' 
(L‘ = LoX (1 + g)*) 

Ct,Kt,It>0 

In this particular case SPLITDAT store the problem according to the time di- 
mension and then splits it into a master problem and one subproblem. This is 
done through the use of the I/O dictionary file. The GAMS I/O dictionary file is 
a character file containing the names of the sets, variables, and equations of the 
model. For the variables and constraints, each name is formed by the original 
name of the variable or constraint plus the corresponding set indices. SPLITDAT 
then recognizes that the capital stock equation is the only transition equation 
that carries over variables from one time period to the next. Once the master 
problem and subproblem have been fully formulated, DECOMP takes the in- 
formation and iterates the Benders decomposition algorithm until an optimal 
solution is found. In the 20 period model (see Figure 2), Benders decomposition 
was able to solve this problem after just 5 iterations with a relative error toler- 
ance of eps = 10“^ (the gap between the upper and lower bound divided by the 
magnitude of the objective function). 
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Fig. 2. Staircase structure solved with Benders decomposition 

3 SETNL with AMPL and CONOPT 

SETNL is a high-level C-I--I- library allowing on the one hand to access standard 
information from the modeling system and on the other to provide the modeler 
with advanced features such as the automated exploitation of the structure and 
the access to partial nonlinear information. SETNL is made up of two parts. The 
first part is called SET and is described in [4] . SET enables one to retrieve block 
structures directly from the algebraic formulation of the original problem. The 
second part which is described in the present paper is about the manipulation 
of NLP problems. 

Standard practice is that AMLs yield nonlinear information for solvers with- 
out allowing to distinguish subsets of variables or subsets of constraints. This 
information is however necessary in any decomposition procedure. SETNL allows 
to break into pieces nonlinear information such as functions values, Jacobian and 
possibly Hessian matrices of nonlinear equations. 

Depending on the type of models, SETNL is able to extract a wide range of 
block structures such as splitting a given structured problem into several sub- 
problems. These subproblems are sent to nonlinear solvers in the appropriate 
input format ready to start the optimization process. Every time a nonlinear 
solver needs updates on partial information, SETNL access the nonlinear inter- 
preter to get them. We present below a sample of routines of different complexity 
which are useful for applying the decomposition scheme on the experiments pre- 
sented in Section 2: 
long get_nlnz(); 

// returns the number of NL non zero of the Jaeobian matrix 
double get_objvalue(real* X, expr* setcol); 

// returns the current value of the objective value for a subset of columns 
“setcol” and a given solution X 

void get_objgrad(real* X, expr* setcol); 

// evaluates the gradient of a subset ’’setcol” of the objective function 
int partition() 

// determines the splitting of the problem 
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The previous example though simple is illustrative since as you can notice in 
Figure 1 once the rows and columns are ordered regarding the time dimension 
they are automatically associated with a subproblem of the staircase structure. 
There is solely the objective function that needs to be broken apart, in this 
particular case, in two pieces (i.e. the master problem and the subproblem). 

Figure 3 shows how SETNL is integrated in the modeling system. The model 
formed by the mathematical formulation and the data are processed in the usual 
way (see Figure 1 for comparison). SETNL is currently linked with The I/O 
AMPL library and CONOPT (an NLP solver). Two I/O AMPL routines have 
been modified to enter as new parameters a given set of indexes that lists the 
variables associated to a given subproblem. 




Fig. 3. SETNL 



For instance in Figure 3 we see that the problem may display a primal block 
angular structure (see formulation 2). Once SETNL knows about the structure 
the problem is split into a master problem and several subproblems in the ap- 
propriate format in order to be red by the CONOPT nonlinear solver. SETNL 
implements in this case the Dantzig- Wolfe algorithm (see Section 4 for a brief 
explanation). Each time CONOPT asks for information updates (e.g. gradients, 
function values) SETNL communicates with the nonlinear interpretor to get 
uniquely the partial non linear information needed. The decompostion proceeds 
until the desired level of precision is attained. 

4 Solving Equilibrium Problems as Welfare Optimization 
Problems 

Our intentions here is to show that thanks to SETNL a model arising from the 
field of computational economics can be solved through embedded solution pro- 
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cedures arising from different academic disciplines. In the case of the MERGE 
model we must take into account two procedural statements: a first loop corre- 
sponding to The Negishi algorithm and a second one associated with the De- 
composition algorithm which is embedded in the Negishi algorithm. Although 
formally coded with SETNL C-|— I- routines, both loops here are explained in an 
intuitive manner, refering to computational economics concepts. This allows us 
to highlight difficulties encountered by the economic modeler when coding such 
an algorithm. 

The Negishi algorithm is presented here in its GAMS original version as 
written by [5]. This section appears at the end of the model. The loop is de- 
fined for a certain number of iterations which is determined by the modeler 
(Negishi is indeed known to be a tatonnement approach). Within each iteration 
the model called here NWEL is optimized by the nonlinear solver. From the pri- 
mal solutions (all the variables finishing with “.L”) and the dual solutions (all 
the variables finishing with “.M”) new Negishi weights, NW(), are computed 
for each region (RG). These Negishi weights are included in the new objective 
function of the nonlinear programming problem and the process continues until 
the chosen number of iterations is reached. 

LOOP (ITER$(0RD (ITER) NE CARD(ITER)), 

SOLVE JM MAXIMIZING NWEL USING NLP; 

DISPLAY TRDBAL.M; 

PVPI(TRD,PP) = TRDBAL.M(TRD,PP)/TRDBAL.M("NUM","2000") : 
NW(RG) = SUM(PP, PVPI("NUM",PP)*C.L(RG,PP)) 

+ SUM((PP,TRD) , PVPI(TRD,PP)*X.L(RG,TRD,PP)) ; 

NW(RG) = NW(RG) / SUM(R, NW(R)); 

NWTITR(ITER,RG) = NW(RG) ; 

); 

Then the nonlinear programming problem solved within each Negishi itera- 
tion corresponds to the solution of a large scale block-angular convex nonlinear 
programming problem of the following form 

p 

maximize E h{Xi) 

(2) 

subject to 2 , 9 i(xi) < 0 

i—1 

hi{xi) < 0, i = 1,2, . . . ,p. 



where Xi G St C TZ'^\Si, fi,gi and hi are convex, with fi : i — 72., gi \ 

72"‘ I — > 72™° , hi : 72"‘ i — > 72™* . We further assume that the interior of the 
feasible set X, defined by the above constraints, is not empty. The problem has 
n = variables and m = mg + constraints. 

The Dantzig- Wolfe decomposition algorithm can be interpreted in an eco- 
nomic way (i.e. price-directed decomposition). Essentially, the subproblems pass 
extreme point proposals among its feasible set, and the master problem tries to 
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find the right convex combinations of the feasible points to arrive at the optimal 
allocation of the shared resources among the subproblems. Specifically, the sub- 
problems pass the maximum and resource usage in a given iteration. The master 
then returns a price vector telling the subproblems whether they are using too 
much or too little of the common resources. This procedure can iterate until the 
prices returned to the subproblems do not change, implying an optimal solution 
has been reached. This scheme fits with economic model containing multiple 
regions. 

5 Conclusion 

We showed in this paper that the SETNL library enables to code solution pro- 
cedures for nonlinear models arising from the economic theory in an efficient al- 
gorithmic environment. Indeed the difficulties to handle nonlinearities force eco- 
nomic modelers to program those solution techniques with the help of modeling 
language syntaxes which is not the primary role of these languages. To illustrate 
the capabilities of SETNL, we developped an embedded Dantzig-Wolfe/Negishi 
algorithm to solve a computational general equilibrium model. The benefits of 
these developments are that the economic modeler can focus on the modeling 
process instead of the solution process. 
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Abstract. A constitutive model is developed within the framework of 
Perzyna’s viscoplasticity for predicting the stress-strain-time behaviour 
of soft porous rocks. The model is based on the hyperelasticity and mul- 
tisurface viscoplasticity with hardening. A time-stepping algorithm is 
presented for integrating the creep sensitive law. An example of appli- 
cation to one-dimensional consolidation is presented. The objectives are 
to: 1. present a soft rock model which is capable of taking into account 
the rate sensitivity, time effects and creep rupture; 2. to discuss the use 
of an incremental procedure for time stepping using large time incre- 
ments and 3. to extend the finite element code Lagamine (MSM-ULg) 
for viscoplastic problems in geomechanics. 



1 Introduction 

For solving geomechanical problems, such as well-bore stability, subsidence, hy- 
draulic fracturing and ect., the most important is to deal with a proper con- 
stitutive model for the complex soft porous rock mechanical behaviour. Various 
models have been developed for the time independent behaviour of chalk as a 
typical soft porous rock. However the failure of an underground structure in 
this rock may occur due to creep deformation and therefore the use of conven- 
tional time-independent procedure for the interpretation of laboratory results 
and the analysis of geotechnical boundary-valued problems may result in so- 
lutions which do not properly capture the actual in situ response. The model 
proposed in this study is a time-dependent inelastic model for rocks and soils 
based on the Perzyna’s elasto- viscoplastic theory, [6]. Motivations for adopting 
Perzyna’s elastic-viscoplastic theory are: 

1. The formulation is well accepted and well used; 

2. The generality of the time-rate flow rule offers the capability of simulating 
time-dependent material behaviour over a wide range of loading; 

3. The incorporation of the inviscid multisurface cap-failure-tension model, 
developed in DIG-ULg in the frame of PASACHALK project, was of interest 
([1] and [2]); and 

4. The formulation is readily adaptable to a numerical algorithm suitable for 
finite element procedure and particularly for implementation in Lagamine 
(MSM-ULg) finite element code. 
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2 Mechanical Model. Perzyna’s Viscoplasticity 

Perzyna’s theory, [6] is a modification of classical plasticity wherein viscous-like 
behaviour is introduced by a time-rate flow rule employing a plasticity yield 
function. The strain rate tensor iij is composed of elastic and viscoplastic 
strains, or by the definition: 



The stress rate tensor &ij is related to the elastic strain rate via a linear elastic or 
hyperelastic constitutive tensor Cijki- Therefore, taking into account the relation 
(1) the elastoplastic constitutive relation between stress and strain rates reads: 



The Jaumann type objective time derivative of stress tensor is defined by CTy- = 
Gij +u)ik <Jkj + <Tik where ui is the anti-symmetric part of the velocity gradient. 
The viscoplastic flow rule is expressed as: 



in which 7 is a fluidity parameter; ^ - viscous flow function; g = g (<r, w) - creep 
potential and / = / (<J, vu) is any valid plasticity function, playing the role of 
loading surface. The parameter w stays for some hardening function of the vis- 
coplastic strain history, i.e., vu = where is an equivalent viscoplastic 

strain representing the magnitude of the viscoplastic deformation. For a given 
value of zu, all states of stress that satisfy / = 0 form the current ’’static” yield 
surface. The ’’static” yield surface forms a boundary between elastic (/ < 0) 
and viscoplastic (/ > 0) domains. When a constant stress state is imposed such 
that / > 0, viscoplastic flow will occur. If / is a nonhardening yield function 
the flow will continue to occur at a constant rate. If / is a hardening function, 
viscoplastic flow occurs at a decreasing rate because as viscoplastic strain accu- 
mulates, zu{e"P) changes in value such that f{a,w) — *■ 0 and thus ^ 0. In 
this way the static yield surface is moving out on a real time to eventually form 
a new static yield surface containing the imposed stress state. Once the new 
static yield surface has stabilized, the steady state solution = 0 is achieved. 
The resulting strains accumulated during this loading would be identical to the 
corresponding time-independent plastic solution. 

2.1 Application to the Soft Porous Rocks 

The concept of two inelastic deformation mechanisms - collapse (volumetric 
or cap) and shear failure or deviatoric - has been applied to the viscoplastic 
analysis. Such a concept is based on the experimental observation for high porous 
rocks and for chalk especially (see [8]). Therefore: 




( 1 ) 



Cijki {ski - er/) > 



(2) 




(3) 
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Indexes c and d are for collapse (cap or volumetric) and deviatoric or shear 
failure viscoplastic strain components and model parameters respectively. In 
the present work the loading surface and creep potential are functions of the 
first, second and third stress invariants: /o- = cr^ Sij , Ilg = ■sj ^SijSij and 

/3 = - i sin“^ , where Ills = 5 SikSkjSji and = cry - I^r/3 is the 

stress deviator. The static yield surface / = 0 and the creep potential function 
g = 0 are divided into two regions along the first stress invariant axis : the 

cap surface region < L = ^ ( tan^c ~ ^Po^) with fc and Qc and the failure 
surface region (la- > L) with fd and gd- Here c is the cohesion, (fc is the friction 
angle in the compression path and po is the preconsolidation pressure. Such an 
approach overcomes difficulties, mentioned in [7] to extend the Perzyna’s type 
viscoplasticity in the case of multisurfase inelasticity. 

Cap loading surface fc is a hardening surface defined by 

fc = IIs + (j<T - (^<r + 3po) = 0 , (5) 

where 



m = 
b = 



a(l + 6sin3/3)” , 

[sim/)c(3 + sin(/)_E)]" - [sin())_E (3 - sin(()c)] ^ 
[sim/fc (3 + sin</)_E)] " + [sin</>_E (3 - sinc^c)] " 
1 2sin</)c 



1/3 3 - sin</)c 



(1 + 6 )- 



( 6 ) 

( 7 ) 

(8) 



n is a model parameter and <^_e is the friction angle in the extension stress path. 
We assume that: 



Hypothesis 1 Hardening for the cap surface is due to the volumetric inelastic 
strain - ef and therefore fc = fc (o’, e^) . The only hardening variable for the cap 
surface, po, depends only on ef. 



The hardening law is given as: 

1 + e , 

Po = T Po 

A — K 



( 9 ) 



with e - the void ratio, A - the slope of the virgin consolidation line and k - the 
slope of swell/recompression line in e — ln(/CT/3) space. For the cap deformation 
mechanism fc = gc and referring to the eq. (3), the associated viscoplastic law 
is: 



£i,- = 7c (^c (fc)) 



dfc 

doii 



( 10 ) 



where 





with oj, L and Pa - material constants. 
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Failure loading surface is a hardening, Van Eekelen [4] yield function: 



fd = Ils + rn 




3c \ 
tan^c / 



= 0 . 



(11) 



Hypothesis 2 For the failure surface there is no hardening due to the cap type 
deformation. The equivalent deviatoric inelastic strain 

= ^ (4 “ dr 

is the only hardening parameter for the failure surface, so fd = fd (o’, e d) o,nd 
thus the internal state variables for the failure deformation mechanism - the 
friction angles and the cohesion - are functions only of e d- 

For the internal state variables the concrete expressions , explored in the present 
work are given like in [3]: 

4>C = (fco + {4>Cf - fico) jj , ( 12 ) 

4>E = (fso + {4>Ef - 4>eo) „ , ( 13 ) 

Bp + Cd 

c = Co + (c/ - Co) , (14) 

-t>c + e d 

where 4>coj 4 ’eo^ cq and tpEf, 4’Ef, Cf are initial and final friction angles and 
cohesion in compression (C) and extension {E) stress paths. Coefficients Bp 
and Be have the values of the equivalent deviatoric inelastic strain for which 
half of the hardening on friction angles and cohesion is achieved, see [3]. 

The viscoplastic flow law is non-associated and taking into account (3) it is 
given by: 

4=7.(^.(/.))^, (15) 

with = (^ , 7 d = 7 c 02 , where ad and 02 are material constants. 

The potential function gd depends on the dilatancy angles ipc and ipE in 
compression and extension paths respectively, and is given as: 

» = ("■’ - s^) ■ <“> 

where 



m! = a'(l + h' sin3/3)’^ 



b' = 



a = 



[sin'i/ic (3 + sink's)] " — [sin'0^ (3 — sin’i/^c)] ^ 
[sin'i/ic (3 + sin'!/)^;)]^ + [sin'0^; (3 — sinipc)]" 

1 2sin'0c 



a/3 3 - sintfe 



a + b'Y 



(17) 

(18) 

(19) 
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Here the well known Taylor rule: (f>c — "tpc = i’E — '>pE = const is used, which is 
based on experimental evidences. 

3 Numerical Algorithm 

This section concerns a way for implementing the Perzyna’s type viscoplasticity 
in the finite element code for large deformation inelastisic analysis Lagamine, [5] . 
It is presumed that the strain history is specified and the objectives is to de- 
termine the corresponding stress history. Using a step-by-step time integration 
scheme a numerical solution algorithm is developed at the constitutive level. 
The time increments realized are large and the nonlinearity is much higher than 
for the classical elastoplastic laws. The errors that are introduced by the inte- 
gration scheme can be significantly reduced by sub-incrementation. In the code 
Lagamine the time increment At = where B indicates the end of the 

time step and A its beginning, is divided into a constant number N of sub- 
intervals with a length An t. For each sub-interval we have to integrate the eq. 
(3), which an incremental form is: 

Act :=^C{a){Ae- Ae^P) (20) 

The right side of (20) depends on cr and the hardening function w. The problem 
posed is therefore to know which stress state and value of vu to introduce in the 
right side of (20). A generalised mid-point algorithm is adopted here, where the 
viscoplastic strain increment is approximated by one-parameter time integration 
scheme as: 

Ae'^P = An t [(1 - 9) ] , (21) 

0 < 0 < 1 . 

In each Gauss integration point the following operations would be carried: 

1 . Use the stress rate from the previous sub-interval i — 1 and the stress at 
the beginning of the step At to evaluate a mid-point stress cr® : 

+ Gi-iO An t (22) 

Hardening parameters, for the cap and Cd for the failure regions are eval- 
uated in the same manner. Using the unified notation vj it reads: 

vj^ = vj^ + zoi-iO An t (23) 

For the first sub-interval, the stress rate and the hardening parameter rate 
are initialised through the explicit or forward-Euler scheme. 

2. Call the constitutive law to calculate approximated values of the stress and 
hardening parameter rates: 

a,=C'(cT®) (£-£"^’(cT®,c:7®)) , (24) 

Wi = w {k'’P (o’®,-n7®)) . 



(25) 



Constitutive Equations and Numerical Modelling of Time Effects 227 



3. Calculate new stress and hardening parameter mid-point approximations: 



O’® =: + ffiO A^t , 

H7® = + ZUiO A]\{t 


(26) 

(27) 


4. Repeat 2. and 3. untill it converges. It is supposed that the convergence is 
obtained after two iterations. 

5. Calculate the stress state at the end of the sub-interval: 


0”® = <j^ + ffiO A^t , 


(28) 


and update the hardening function: 




+ Wi9 AmI 


(29) 


6. Take into account an Jaumann correction based on the 
value: 

a = CT +ija + <7 a; 


mid-point stress 
(30) 



The above described updating procedure depends on the current value of 
the stress invariant /g., which dictates the type of the activated deformation 
mechanism, li I„ < L then cap constitutive equations (5)-(10) are employed in 
2. and for I^> L equations (11)-(19) are used. 

4 Numerical Example 

Finite element simulation of one-dimensional consolidation has been performed. 
Plain strain state has been considered. The material property data is: mass 
density of solid skeleton p = 2.647 kNs'^/m'^ and the initial porosity n = 0.332. 
Elastic properties are caracterised by constants: E = 3.6 x 10® kPa, v = 0.3. 
For the viscoplastic response material constants are: (l)Cf = 4’co = 32°, (j)Ej = 
(j)Eo = 52°, cf = co = 10 kPa, Bp = 0.0001, = 0.0002, n = -0.229, = 0.9, 

ad = 0.1, u) = 2.0 X 10“®, 02 = 1.3, l = 0.52 and the reference stress Pa = 
1.0 X 10® kPa. The sizes of the sample are 3.00x3.00 meters. Finite element mesh 
and boundary conditions are shown on Fig.l, a. For the numerical simulation a 
multistage rapid loading path followed by creep has been applied such that for 
0 < t < 4 s there is loading up to 1.2 MPa, for 4s < t < 236 s the load is kept 
constant, for 236 s < t < 240 s loading up to 2.4 MPa and for t > 240 s there is 
a creep with a constant load of 2.4 MPa. Fig.l, b. illustrates a typical variation 
of the vertical displacement with the loading history at nodal points 7 and 9 . 

5 Conclusions 

The viscoplastic formulation and the numerical algorithm presented provide a 
general format for extending inviscid models to Perzyna-type viscoplasticic con- 
stitutive relationships suitable for finite element applications. The problem of 
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0.2 1.2 4.4 72 122 929 1001.8 





Fig. 1. a. Finite element mesh and boundary conditions for the numerical ex- 
ample. b. Displacement versus time in nodal points 7 and 9 - a representative 
result 

Perzyna’s viscoplasticity extension to multisurface inelasticity is solved by di- 
viding the stress space on subspaces depending on the value of the first stress 
invariant and by defining for each subspace loading and potential functions prop- 
erly modelling the deformatiom mechanism activated by stress states belonging 
to the given subspace. As an application of this concept, a viscoplastic model 
for high porous rocks capable of describing experimentally observed shear failure 
and collapse deformation mechanisms is presented. The model is implemented 
into the Lagamine finite element code. A numerical test example for solving 
one-dimentional nonlinear consolidation shows a reasonable prediction, qualita- 
tively representing the experimental observations during oedometer creep tests 
on chalk, [8]. The experience with the Lagamine FE code shows that the selec- 
tion of the time step length is very important for the accuracy of the solution. 
The variable time stepping scheme, realized by Lagamine automatic strategy is 
more advantageouse to achieve the solution accuracy. Further work is needed for 
evaluating proper identification techniques and experimental verification of both 
elastic and inelastic behaviour. The experimental data has to be sufiitiant for 
performing a more precise least - square nonlinear estimation procedure. Thus 
it will be possible to compare not only qualitatively but also quantitavely the 
experimental and numerical results. 
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Abstract. In this note we compare the sensitivity of six advanced solvers 
for systems of nonlinear algebraic equations to the choice of starting vec- 
tors. We will report on results of our experiments in which, for each test 
problem, the calculated solution was used as the center from which we 
have moved away in various directions and observed the behavior of each 
solver attempting to find the solution. We are particularly interested in 
determining the best global starting vectors. Experimental results are 
presented and discussed. 



1 Introduction 

Recently we can observe a growing interest in engineering problems resulting 
in large systems of nonlinear algebraic equations. For instance, in a real-world 
problem originating from avionics [11,12] a realistic model would require solution 
of 500-1- equations, but due to the lack of convergence, the programs and methods 
used by the authors were unable to solve systems of more than 64 equations. 

The mathematical theory and computational practice are well established 
when a system of linear algebraic equations or a single nonlinear equation is to 
be solved [18]. This is clearly not the case for systems of nonlinear algebraic 
equations. Our current research has shown both a lack of libraries of solvers and 
standard sets of test problems (different researchers use different test problems 
with only a small overlap). In this context we have to remember that, until 
recently, in the engineering practice, only systems with relatively few equations 
have been solved. This explains one of the problems of existing “popular” test 
cases. Most of them have a very small number of equations (2-10) and only very 
few are defined so that they can reach 100 equations. In our earlier work [6, 7, 8, 9] 
we have reported on our efforts to collect most of the existing solvers and apply 
them to up to 22 of standard test problems with the number of equations ranging 
from 2 to 200. We were able to locate solvers based on Newton’s method and 
its modifications. Brown’s, bisection, continuation, hybrid algorithms, and the 
homotopy, and tensor methods and applied each solver to the test problems 
collected from the literature and the Internet . We were able to conclude that we 
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can exclude the simple algorithms and in-house implementations from further 
testing, that methods like homotopy and continuation cannot be used as a black- 
box approach without more work and the t ensor method seemed to be the most 
robust. 

When the test problems were considered, we were able to find that five of 
them are easily solvable by all approaches and thus they are useless for testing 
purposes. The results of the remaining test problems allowed us to observe that 
proper choice of the starting vector has a strong effects on the solution process 
(bad selection of the starting vector can result in lack of convergence). Because 
the likelihood of convergence depends on the solution method and the problem to 
be solved, we decided to perform a behavior comparison of six advanced solvers 
for the test problems identified earlier. 

In this note, we will report on results of our experiments in which, for each 
test problem, the perturbed solution, and initial starting vectors of all ones, zeros 
and random numbers were used to observe the behavior of each solver attempting 
to find the solution. Based on these experiments we will try to establish which 
solvers can handle global convergence. 

The paper will be organized as follows. Section 2 briefly describes the solvers 
that used in our work. In section 3 we introduce the test problems. Section 4 
will summarize the results of our numerical experiments followed by a concluding 
remarks and description of future work. 



2 Solvers and Algorithms for Systems of Nonlinear 
Algebraic Equations 

As mentioned above, in our earlier work, we have found that only more so- 
phisticated algorithms are capable of solving test systems of nonlinear algebraic 
equations (outside of the group of five easy ones). We are now focusing on non- 
commercial versions of codes based on a hybrid algorithm and the Brown’s, homo- 
topy, continuation, and tensor methods. These algorithms are all documented in 
ACM TOMS and briefly reviewed in [15]. Their implementations were obtained 
from the NETLIB repository [17]. It is appropriate to use [15] as the reference 
where brief descriptions of the code are given. Further, in the subsections of §2 
the original works where the methods were proposed have to be referred. We have 
thus modified (to handle up to 200 equations) the following software packages: 
1) HYBRD, 2) SOS, 3) CONTIN, 4) HOMPACK, and 5) TENSOLVE. Recently 
we have also discovered and added to this list the LANCELOT package, which 
is a part of the NEOS environment [16]. 

We will now briefly summarize these algorithms and the solvers (in all cases 
the references cited and [18] should be consulted for the details). We assume 
that a system of n nonlinear algebraic equations /(x)= 0 is to be solved where 
X is n-dimensional vector and 0 is the zero vector. 
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2.1 HYBRD 

HYBRD is part of the MINPACK-1 suite of codes [13,14]. HYBRD’s design 
is based on a combination of a modified Newton method and the trust region 
method. Termination occurs when the estimated relative error less than or equal 
the defined by the user tolerance (we used the suggested default value of the 
square root of the machine precision) . 

2.2 SOS 

SOS is a part of the SLATEC suites of codes [10]. SOS solves a system of N si- 
multaneous nonlinear equations in N unknowns. It solves the problem /(x) = 0 
where a; is a vector with components x(l),...,x(N) and / is a vector of non- 
linear functions. This code is based on an iterative method called the Brown’s 
method [2] which is a variation of Newton’s method using Gaussian elimination 
in a manner similar to the Gauss-Seidel process. All partial derivatives required 
by the algorithm are approximated by first difference quotients.. The conver- 
gence behavior of this code is affected by the ordering of the equations, and it is 
advantageous to place linear and mildly nonlinear equations first in the ordering. 
Gonvergence is roughly quadratic. This method requires a good choice for the 
starting vector xq. 

2.3 CONTIN 

GONTIN, also know as PITGON [19] implements a continuation algorithm with 
an adaptive choice of a local coordinate. A continuation method is designed 
to be able to target more complicated problems and is the subject of various 
research efforts [1,20]. This method is expected to be slower than linesearch and 
the trust region methods, but it is to be useful on difficult problems for which a 
good starting point is difficult to establish. The method defines an easy problem 
for which the solution is known along with a path between the easy problem 
and the hard problem that is to be solved. The solution of the easy problem is 
gradually transformed to the solution of the hard problem by tracing this path. 
The path may be defined as by introducing an addition scalar parameter A into 
the problem and defining a function 

^(x, A) = /(x) - (1 - A)/(xo) (1) 

where a:oGR". The problem /i(x. A) = 0 is then solved for values of A between 
0 and 1. When X=0, the solution is clearly x = Xq. When A=f, we have that 
h( x,l)=f(x), and the solution of h( x,X) coincides with the solution of the orig- 
inal problem f (x)=0. The algorithm for constructing the path is given in [19]. 
The convergence rate of the continuation methods varies, but according to doc- 
umentation, the method does not require a good choice of the initial vector xq. 
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2.4 HOMPACK 

HOMPACK [21] is a suite of subroutines for solving nonlinear systems of equa- 
tions by homotopy methods [4]. The homotopy and continuation methods are 
closely related. In the homotopy method, a given problem f(n.)=0 is embedded 
in a one-parameter family of problems using a parameter A assuming values in 
the range [0, . . . , 1]. Like the continuation method, the solution of an easy prob- 
lem is gradually transformed to the solution of the hard problem by tracing a 
path. There are three basic path-tracking algorithms for this method: ordinary 
differential equation based (code FIXPDF), normal flow (code FIXPNF), and 
quasi Newton augmented Jacobian matrix (code FIXPQF). The code is available 
in both Fortran 77 and Fortran 90 [21]. The Fortran 77 version was used in our 
test. We tested all three approaches and since the results were very close, we will 
report FIXPDF results only. 

2.5 TENSOLVE 

TENSOLVE [3] is a modular software package for solving systems of nonlinear 
equations and nonlinear least-square problems using the tensor method. It is in- 
tended for small to medium-sized problems (up to 100 equations and unknowns) 
in cases where it is reasonable to calculate the Jacobian matrix or its approx- 
imations. This solver provides two different strategies for global convergence; 
a line search approach (default) and a two-dimensional trust region approach. 
The stopping criteria is meet when the relative size of x/j+i — Xfc is less than 
the macheps'^ , or j|/(xfc+i)j|oo is less than macheps'^ , or the relative size of 
/'(xfc+i)^/(xfc+i) is less than macheps's and unsuccessfully if the iteration limit 
is exceeded. 

2.6 LANCELOT 

LANCELOT is one of the solvers available on the NEOS Web-based environ- 
ment [5,16]. The NEOS environment is a high speed, socket-based interface for 
UNIX workstations that provide easy access to all the optimization solvers avail- 
able on the NEOS Server. This tool allows users to submit problems to the NEOS 
Server directly from their local networks. Results are displayed on the screen. 
LANCELOT is a standard Fortran 77 package for solving large-scale nonlinearly 
constrained optimization problems. The areas covered by Release A of the pack- 
age are: unconstrained optimization problems, constrained optimization prob- 
lems, the solution of systems of nonlinear equations, and nonlinear least-squares 
problems. 

The software combines a trust region approach adapted to handle the bound 
constraints, projected gradient techniques, and special data structures to ex- 
ploit the (group partially separable) structure of the underlying problem. It 
additionally provides direct and iterative linear-solvers (for Newton equations), 
a variety of preconditioning and scaling algorithms for more difficult problems, 
qnasi-Newton and Newton methods, provision for analytical and flnite-difference 
gradients. 
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3 Test Cases for Systems of Nonlinear Algebraic 
Equations 

In previous studies we were able to classify several of the test problems as easily 
solvable by all methods. These included the Rosenbrock’s, Discrete Boundary 
Value, Broyden Tridiagonal, Broyden Banded and the Freudenstein-Roth func- 
tions [17]. Since the fact that a solver is capable of solving them introduces no 
new information we have decided to remove them from further considerations. 
In our search for test problems we have come across problems of least squares 
type as well as constrained and unconstrained optimization. We have decided 
to concentrate out attention strictly on systems of nonlinear algebraic equations 
and Table 1 contains the list of test problems used in our work. 



Table 1. Test problems 



1. Powell singular function [17] 


10. Variably dimensioned function[17] 


2. Powell badly scaled function [17] 


11. Exponential/Sine Function [22] 


3. Wood function [17] 


12. Semiconductor Boundary Condition[22] 


4. Helical valley function[17] 


13. Gulf Research and Development[17] 


5. Watson function[17] 


14. Extended Powell Singular [17] 


6. Chebyquad function[17] 


15. Extended Rosenbrock[17] 


7. Brown almost-linear function[17] 


16. Dennis, Gay and VU[17] 


8. Discrete integral equation [17] 


17. Matrix Square Root [17] 


9. Trigonometric function[17] 





All codes are implemented in Fortran 77 and were run in double precision on 
a PC with a Pentium Pro 200 MHz processor. When applying the five solvers we 
have kept the default settings of all parameters as suggested in the implemen- 
tation (which matches our assumption of the solver being treated like black-box 
software) . 

3.1 Simple Test Case 

In this study we examined the 17 test problems summarized in Table 1 by 
studying the sensitivity of the starting vectors. For each problem we used the 
default vector, all ones, all zeros and random numbers as our initial starting 
vectors. We used the default number of equations for each problem, which ranged 
from 2 to 10 equations. 

We were able to observe behavior patterns from the problems the were able 
to converge which helped us determine which solvers are more adapt for global 
convergence. The results for each problem were typical to that of problem 8, the 
Brown Almost-linear function. Figure 1 shows the number of iterations required 
for convergence for each of the various testing methods used on this problem. 
This problem shows that it is easy to converge with any solver as long as the 
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initial starting vector is in a certain range of the solution vector but once outside 
of that range, convergence did not occur for HYBRD, SOS, CONTIN, HOM- 
PACK, and LANCELOT. We applied the same test to the other problems and 
found the pattern set by these problems to be consistent - solvability of the test 
problems depends on the solver and the starting vector except for TENSOLVE. 
It appears that TENSOLVE seems to be more robust and was able to achieve 
convergence regardless of the starting vectors. 
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Fig. 1. Various Initial Starting Vectors for Problem 8 



3.2 More Difficult Test Cases 

We ran a set of test on all 17 problems. The starting vector were defined by 
adding percentages to a known solution in increments of 10%. For example, the 
solution set for Problem 8 is [1.0, . . . , 1.0] for n=10. We ran the problem 10 times 
using the initial values sets of ([1.1, . . . , 1.1], [1.2, . . . , 1.2],. . . , [2.0,. . . ,2.0]). Next 
we repeated the process subtracting percentages from a known solution in incre- 
ments of 10%. We then recorded the point when there was non-convergence in 
the positive direction (adding percentages) and the negative direction (subtract- 
ing percentages). If there was always a convergence, we recorded the results as 
100%, otherwise, we record the exact percentage away from the exact solution 
that non-convergence occurred. We then noted that the behavior was similar 
for all the test problems and the convergence rate above and below the exact 
solution for each problem per solver was less than 20% as shown in Figure 2. 

The results are rather interesting as they show that, in the experimental 
setup used in our experiments; the tensor method outperforms the other solvers 
(with the combination trust region/projected gradient technique coming second 
and 

hybrid method third) when looking at global convergence. The TENSOLVE, 
LANCELOT, HYBRD and SOS codes appear to have been better designed to 
handle various initial starting vectors. 
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Fig. 2. Average Percentage Convergence in Dependence of Starting Vector of 
the Methods 



4 Conclusions and Future Work 

In this note we have briefly reported on our experiments analyzing the sensitivity 
of the initial starting vectors for standard test systems of up to 10 nonlinear al- 
gebraic equations solved by five advanced solvers. We have established a pattern 
with previous works and this set of experiments and has found that: 

— solvability of the test problems depends on the solver and the starting vector, 

— problems which are not solvable using one method may be solvable by an- 
other method, and 

— of the solvers tested, the tensor method based solver appeared to be most 
robust. 

Similar results were achieved for large number of equations in a previous publi- 
cation [6]. 

Our future work will concentrate on expanding the tensor method based 
solver (as the most promising one) to handle very large systems. We will also 
continue our search for solvers that can handle medium to large systems of 
nonlinear algebraic equations as well as new interesting test problems that can 
be recommended to study the robustness of the nonlinear solvers. We will apply 
these solvers to the original avionics problem and observe their performance. 
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Abstract. Modern investment processes often use quantitative models 
based on Markowitz’s mean- variance approach for determining optimal 
portfolio holdings. A major drawback of using such techniques is that 
the optimality of the portfolio structure only holds with respect to a 
single set of expected returns. Becker, Marty, and Rustem introduced 
the robust min-max portfolio optimization strategy to overcome this 
drawback. It computes portfolio holdings that guarantee a worst case 
risk/return tradeoff whichever of the specihed scenarios occurs. In this 
paper we extend the approach to include transaction costs. We illustrate 
the advantages of the min-max strategy on balanced portfolios. The im- 
portance of considering transaction costs when rebalancing portfolios is 
shown. The experimental results illustrate how a portfolio can be insured 
against a possible loss without sacrihcing too much upside potential. 



1 Introduction 

One of the most widely used models for determining optimal portfolio holdings 
with respect to a tradeoff between expected return and risk is the mean- variance 
approach introduced by Markowitz [3] . The basic argument behind the model is 
that investors hold portfolios with the highest expected return for a given level 
of variance. Although the mean- variance portfolio optimization model allows to 
compute optimal portfolio holdings, the optimality only holds with respect to 
a single set of expected returns. Furthermore, slight changes in the expected 
returns have a big implact on the optimal portfolio holdings. 

In [5] Becker, Marty, and Rustem introduced the robust min-max portfo- 
lio optimization strategy (min-max strategy) to overcome this problem. Their 
framework considers a finite set of possible expected returns, called scenarios. 
It computes portfolio holdings that guarantee a worst case risk/return tradeoff 
whichever of the specified scenarios occurs. In contrast with Markowitz’s ap- 
proach, the min-max strategy only gives a lower bound on the expected return 
for a given variance or risk. 

Using a framework like the min-max strategy proves very useful for determin- 
ing short term portfolio structure changes, for example, for balanced portfolios^. 

* The views expressed in this paper are those of the authors and do not necessarily 
reflect the opinion of Credit Suisse Asset Management. 

^ A balanced portfolio is a portfolio consisting of equities, bonds, and cash. 
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It allows to over- and underweight certain assets or asset classes, based on dif- 
ferent return forecasts. As any portfolio modification induces costs, we extended 
the min-max strategy to take into account transaction costs. 

Other approaches currently under investigation for computing efficient port- 
folio holdings, with respect to a given utility function, are stochastic multi-period 
optimization models [2] , models based on continuous time methods using partial 
differential equations [4], or factor models [1], to name just the most important 
ones. 

2 The Min-Max Strategy 

We review the min-max portfolio strategy introduced by Becker, Marty, and 
Rustem [5]. Consider a set of individual assets or asset classes. Consider a set S 
of return scenarios, each scenario representing a specific view of the market 
outcome. For example, one scenario could be based on interest rate raise ex- 
pectations, or a slowdown of a specific region’s economic growth. Let be the 
expected returns of the asset classes considered, with respect to scenario s G S. 
We assume that the investor is interested in performance relative to a given 
benchmark portfolio, rather than absolute performance^. Let p be the investor’s 
initial portfolio holdings, b the considered benchmark portfolio, t the transac- 
tion costs in percents, and A, c representing general constraints. Let Q be the 
covariance matrix associated with the asset classes. It may be estimated using 
historical data or using volatility and/or correlation forecast models. For the 
sake of simplicity, only a single covariance matrix is used. In the most general 
setting, the min-max strategy allows for multiple covariance matrices, one for 
each forecasted return scenario. 

The robust min-max portfolio optimization strategy can be formulated as 
(w - 6)'Q(a; - 6) 

subj. to Vs G S: r'g {u> — b) — t' (a;+ -f > R 
V u> = l ^ 

U3 ~ p + — U3 ^ ' 

uj > 0,o;+ > 0,ci;“ > 0 

Au 3 < c . 

The unknown weights a; represent the portfolio holdings and and the 
buy and sell decisions. R represents the lower bound of the expected return 
defining the risk/return tradeoff. It is called the min-max return. The portfolio 
holdings a;*, solution of the problem (1) for a given R, are called min-max 
optimal portfolio holdings. We assume that the investor is fully invested and that 
no short positions are allowed. Instead of fixing R, most investors require e(a;) = 
(w — b)'Q(u! — b), the portfolio tracking error, to be bound by some constant. 
In this case, we iterativly solve problem (1) for different values of R by using a 

^ Most institutional investors evaluate their performance against the performance of 
the market, represented by an index or benchmark like, for example, the S&P 500. 



240 Claude Diderich and Wolfgang Marty 



Table 1. Considered indices modeling the five asset classes used, benchmark 
structure, initial portfolio holdings, as well as three return scenarios 



Index 


b p 


see. 1 


see. 2 


see. 3 


Salomon SFr. 3 month money market 


5% 10% 


2% 


2% 


2% 


Salomon SFr. Gov. Bond 1-|- 


40% 30% 


3% 


5% 


5% 


Salomon World Gov. Bond 


15% 10% 


3% 


3% 


5% 


FT Switzerland 


25% 30% 


7% 


7% 


8% 


FT World ex. Switzerland 


15% 20% 


7% 


5% 


7% 



bi-sectioning algorithm. Such an approach can be used because R = /(e(o;*)) 
is a monotone increasing function for min-max optimal portfolio holdings. The 
graphical representation of / is called the min-max frontier. 

It can be shown that adding any return scenario r = where 

X^sgS = 1 and As > 0, to S' does not change the optimal solution of prob- 
lem (1). 

Although problem (1) does not have a complex structure, solving it is not 
an easy task. This is especially due to the fact that the Hessian matrix of the 
quadratic program, although positive semi-definite, is singular. Furthermore, for 
small values of R, the objective function takes values close to zero. For the 
experiments described in this paper, we relied on the CPLEX barrier quadratic 
programming algorithm. 

3 Experimental Results 

To illustrate the min-max strategy, we consider a portfolio based on the asset 
classes represented by the indices in Table 1. These asset classes are common 
for Swiss pension fund portfolios. The initial portfolio p as well as the consid- 
ered benchmark b are also shown in Table 1. For illustrative purposes, we use 
the return scenarios shown in Table 1. All computations are done using a one 
month horizon. Input data as well as all the results, unless otherwiese stated, 
are annualized for reading convenience. The correlations between asset classes 
are estimated using ten years of historical monthly data (1988-97). All data is 
provided by Datastream, converted to Swiss francs where necessary. 

3.1 The Min- Max Frontier 

In Fig. 1 we illustrate the min-max frontier, which, for a given risk, represents the 
worst return to expect for a min-max optimal portfolio. Indeed, whichever of the 
three specified scenarios occurs, the relative return obtained is at least as large 
as the min-max return. Furthermore, the scenario giving a return equal to the 
min-max return is not always the same. Indeed, for small tacking error values, 
scenario one gives the smallest relative return, whereas for large tracking error 
values scenario two gives the smallest relative return. The min-max strategy 
maximizes the worst case expected relative return. 
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Annualized tracking error 

^^“min-max frontier — ©— min-max portfolios in scenario 1 

— A — min-max portfolios in scenario 2 — — min-max portfolios in scenario 3 



Fig. 1. A clipping of the min-max frontier as well as the relative return of the 
min-max optimal portfolio holdings evaluated using the three return scenarios 



On the other hand, as illustrated in Fig. 2, consider the mean-variance ef- 
ficient frontier using scenario one. Choosing a portfolio on this frontier but a 
different scenario occurring, for example the scenario two, will give a consider- 
ably worse return than the min-max optimal portfolio having the same risk. The 
min-max approach computes portfolios that guarantee a minimal relative return 
with respect to all the given scenarios. For this insurance agains a potential 
loss a certain premium has to be paid. In general the premium is less than the 
potential loss, but mathematically there exists no non-trivial relation between 
them. 




Annualized tracking error 

min-max frontier O frontier 1 in scenario 1 

— A — frontier 1 in scenario 2 — ^ — frontier 1 in scenario 3 



Fig. 2. Relative returns obtained from min-max optimal portfolio holdings com- 
pared to relative returns obtained from portfolio holdings selected on the mean- 
variance efficient frontier computed using scenario one 
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Fig. 3. Comparison of the premium paid using the min-max strategy to insure 
against potential loss 

In Fig. 3 we illustrate the premium to be paid for the potential loss insurance. 
If scenario one occurs, the premium paid by choosing a min-max portfolio instead 
of a portfolio on the efficient frontier with respect to scenario one is about 12 
basis points^ at an annualized tracking error level of 3.4%. On the other hand, 
comparing the two same portfolios, but scenario two occurring, the min-max 
portfolio insures against a potential loss of 20 basis points annualized at the 
same tracking error level of 3.4%. 

3.2 The Effect of Transaction Costs 

Up to now we did not consider transaction costs. But, if the min-max strategy 
is used for tactical asset allocation'^, transaction costs must not be neglected. To 
illustrate this situation, we consider a portfolio which we wanted to rebalance to 
a min-max optimal portfolio. We set the transaction costs to be 50 basis points 
for all asset classes, except for money market where we use 20 basis points. The 
situation is illustrated in Fig. 4, using the scenarios from Table 1. We compute 
min-max optimal portfolios with and without considering transaction costs. We 
then compare the expected annualized returns with and without transaction 
cost, transaction cost adjusted. The additional gain from considering transaction 
costs for the example in Fig. 4 is around 50 basis points annualized, for tracking 
errors between 0.2% and 1%, the value increasing even further for larger tracking 
errors. The effect of transaction costs is sensible when the transaction costs are 
of the same order of magnitude as the expected relative returns. In any case, 
considering transaction costs does never deteriorate the solution. 

^ One basis point is an alternative notation for 0.01%. 

A tactical asset allocation decision is a decision to change the portfolio structure such 
as to take advantage of expected short term movements in the markets. Usually the 
horizon for tactical asset allocation decisions does not exceed one month. 
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Annualized tracking error 

min- max frontier without considering transaction costs 
■ ■ ■ min-max frontier without considering transaction costs, transaction costs deduced 
1 min-max frontier considering transaction costs 

min-max frontier considering transaction costs, transaction costs deduced 

Fig. 4. The effect of transaction costs on the expected relative min-max return 

3.3 Backtesting the Min-Max Strategy 

To illustrate the validity of the min-max strategy in a dynamic context, we 
simulate its application over a two year horizon. The chosen timeperiod, from 
January 1998 up to May 2000, includes the Russian crises in fall 1998. The 
portfolio is rebalanced monthly, when necessary. For this experiment we assume 
the benchmark as well as the initial portfolio structure shown in Table 1. The 
covariance matrix is estimated as described previously and kept constant over the 
simulation horizon. We use three scenarios, which we calculate as 1) one month 
monthly historical returns, 2) three month monthly historical returns, and 3) one 
year monthly historical returns at the date of rebalancing. The portfolio selected 
each month is such that its annualized tracking error does not exceed 1 %. 

In Fig. 5 we show the evolution of the value of an investment of 100 Swiss 
francs starting in January 1998. Realized returns are used for the computation. 
Furthermore we present the returns obtained when choosing the portfolio on the 
efficient frontier associated with each single scenario. 

Fig. 6 illustrates the structure of the min-max portfolio over the two year 
simulation horizon. From a portfolio manager’s perspective, these changes are 
reasonable and implementable. During the optimization, no explicit restrictions 
on the maximal turnover, except for transaction costs of 50 basis points for all 
asset classes, except for money market (20 basis points), were used. 

4 Conclusion 

In this paper we have represented the min-max strategy for computing optimal 
portfolio holdings. The computed portfolio holdings guarantee a certain expected 
return with respect to a given set of scenarios. They represent portfolios insured 
against downside risk without sacrifycing too much upside potential. We have 
illustrated how to take into account transaction costs during the computation of 
efficient portfolios. 



244 Claude Diderich and Wolfgang Marty 




Jan.98 May.98 Sep.98 Jan.99 May.99 Sep.99 Jan.OO May.OO 
Bench. Min-max — O — Strat. 1 — A — Strat. 2 — x — Strat. 3 



Fig. 5. Evaluation of the min-max strategy over a two year period and compar- 
ison with mean-variance efficient frontier strategies 




B Equity world □ Equity Switzerland □ Bonds world 
S Bonds Switzerland □ Money market 

Fig. 6. Asset allocation structure of the min-max portfolios over the two year 
simulation period 



References 

1. J. Y. Campbell, A. W. Lo, and A. C. MacKinlay. The Econometrics of Financial 
Markets. Princeton University Press, Princeton, NJ, 1997. 239 

2. G. Consigli and M. A. H. Dempster. The CALM stochastic programming model 
for dynamic asset-liability management. In Worldwide Asset and Liability Mod- 
eling, chapter 19, pages 464-500. Cambridge University Press, Cambridge, United 
Kingdom, 1988. 239 



The Min-Max Portfolio Optimization Strategy 245 



3. H. Markowitz. Portfolio Selection: Efficient Diversification of Investments. John 
Wiley, New York, NY, 1959. 238 

4. R. C. Merton. Continuous-Time Finance. Blackwell, Malden, MA, 1992. 239 

5. B. Rustem, R. Becker, W. Marty. Robust min-max portfolio strategies for rival 
forecast and risk scenarios. Journal of Economic and Dynamic Control, to appear. 
238, 239 



Convergence Rate for a Convection Parameter 
Identified Using Tikhonov Regularization 



Gabriel Dimitriu 

University of Medicine and Pharmacy, Faculty of Pharmacy, 
Department of Mathematics and Iirformatics, 6600 Iasi, Romania 
dimitriuOumf iasi .ro 



Abstract. In this paper we establish a convergence rate result for a 
parameter identification problem. We show that the convergence rate of 
a convection parameter in an elliptic equation with Dirichlet boundary 
conditions is 0{y/5), where 5 is a norm bound for the noise iir the data. 



1 Introduction 

In this study we present a convergence rate result for a parameter identification 
problem. To be precise, we show that the convergence rate of the convection 
parameter b in the elliptic equation 

- {aux)x + bua; + cu = f in (0,1), (1) 

with Dirichlet boundary conditions u{0) = u(l) = 0 is where (5 is a norm 

bound for the noise in the data /. This parameter represents the solution of the 
identification problem associated with (1) and regularised by Tikhonov method. 
We take / € L^(0,1), {a,b,c) e Q C Q = x x 

with Q endowed with the Hilbert-space product topology and Q = {(a, 6, c) S 
Q : 0 < a < a{x), |a|vyi, 2 ( 0 , 1 ) < |&| wc 2 (o,i) < c(x) > c> 0 a.e. in (0, 1)}. 



2 Functional Framework 

Following the functional framework described in [4] we consider the nonlinear 
ill-posed problem 

m = /o. (2) 

By ill-posedness, we always mean that the solutions do not depend continuously 
on the data. Here F : Dom(F') C A ^ V is a nonlinear operator between 
Hilbert spaces X and Y. We assume that the operator F satisfies the following 
conditions: 

(i) F is continuous and 

(a) F is weakly (sequentially) closed, i.e. for any sequence {<j„} C Dom(F), 
weak convergence of to q in X and weak convergence of F(qn) to / in V 
imply q € Dom(F) and F{q) = f. 
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We use the concept of an g*-minimum-norm solution qq for the problem (2): 



Definition 1. Let q* G X he fixed. We say that qq is (jf*-minimum-norm solution 
{q*-MNS) for (2) if 

II-F'(9 o) - /oil = min{||F(g) - /oil : q € Dom(F)} (3) 

and 

||go-<?l=min{||g-g*|| : ||F(g) - /o|| = ||F(go) - /oil}- (4) 

A solution of (3) and (4) need not exist and, even if it does, it need not be 
unique, because of the nonlinearity of F. Also we note that q* plays an important 
role in obtaining the solutions defined by (3) and (4). Thus, the choice of q* can 
influence which (least-squares) solution we want to approximate. In the situation 
of multiple least-squares solutions, q* plays the role of a selection rule. In what 
follows we assume existence of an <7* -minimum-norm least-squares solution for 
the unperturbed data /o G Y. 

To cope with the ill-posedness of the problem (2) we shall use the well known 
Tikhonov regularisation. By this method a solution for (2) is approximated by 
a solution of the nonlinear regularised optimization problem 

min {||F(q)-/,f + a|k-g*f}, (5) 

ijGDom(F) 

where a > 0 is a small parameter, fs G Y is an approximation to the exact 
right-hand side /q. 

From computational reasons, since problem (5) can only be solved approx- 
imately we slightly generalise it by considering the problem of finding an ele- 
ment q^'^ G Dom(F) such that 

\\F{q^.^) - fsr + < l|i^(9) - Ml" + «lk - <?!" + V, (6) 

for all q G Dom(F), where 77 > 0 is a small parameter. Obviously, for rj = 0, the 
problem (6) is equivalent to (5). 

Aspects of stability, convergence and convergence rates (as a ^ 0) has been 
extensively studied in the literature, both in the linear and nonlinear case, e.g. 
in [1], [2], [3], [4], [6], [8]. Under the given assumptions on operator F and using 
compactness-type arguments it was proved in [4] that problem (6) admits a 
stable solution in the sense of continuous dependence of the solutions on the 
data fs and that the solutions of (6) converge towards a solution of (2) as a — > 0 
and fs ^ fo- 

3 Convergence Rate Result 

We now focus on the convergence rate analysis. The theorem below gives suf- 
ficient conditions for a rate ||g^’’' — go II = 0{'/S) for the regularised solutions. 
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Theorem 1. ([4]) Let Dom{F) be eonvex, let fs G Y with \\fs — /o|| < <5 and 
let qo be an q*-MNS. Moreover, let the following eonditions hold: 

(i) F is Frechet differentiable, 

(ii) there exists L > 0 such that 

||F'(go) - F'{q)\\ < L\\qo-q\\, for all q G Dom(F), 

(Hi) there exists w GY satisfying 

qo-q* = F\qo)*w, 

(iv) L\\w\\ < 1. 

Then for the choices a S and r] = O(S^), we obtain : 

\\qf;^-qo\\=0{V~S). 

If F is twice Frechet differentiable, condition (ii) and (iv) may be replaced 
by the weaker condition 

(**)' 

with p < 1. 

To see this, note that the left-hand side of {ii)' equals 2{w,r^'^) with r^'^ as 
in relation 

F{qt^) = F{qo) + F'{qo){qt" ~ qo) + ■ 

Taking into account the conditions (i) and {ii) of Theorem 1 we have 

\\rti^\\<lL\\q-qor. 

In the specific setting given by problem (2) the parameter b plays the role 
of q and the operator F is given by the mapping parameter i— > solution, that 
is F{q) := u{b). For b G Dom(F'), let A{b) : iJ^(0,l) niJQ(0,l) ^ i^(0, 1) be 
defined by 

A{b)ip = -{aifx)x + bifx + eg). 

To apply the convergence rate result given in Theorem 1, we next calculate 
the first and second order Frechet derivatives of the function b ^ u{b). 

Lemma 1. The mapping b u{b) from kF^’^(0,l) into kF^’^(0, 1) is Frechet 
differentiable with the Frechet differential with increment h denoted by Sbu{b)h := 
r]{h), with rj{h) the unique solution of 

- {ar]{h)x)x + bqx{h) +cr]{h) = -hux{b), in {0,1), (7) 

with boundary conditions 



r]{h){0) = T]{h){l) = 0. 
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Proof. The verification is quite standard but we include it for the purpose of 
completness (see [7]). We define the sets B := {b G : l&lwcs < fJ-} 

and Bs := {b € 1) : |&|wi.2 < fj. + d}. Let h G and b G B 

and note that there exists e{h) > 0 such that for any e G (0,e(h)) the element 
b + eh G Bs and u{b + eh) exists. Set = e~^{u{b + eh) — u{b)) and observe 
that must satisfy 



- {au%)x + bul + cu^ = -hux{b + eh), (8) 

u®(0) = u®(l) = 0. 

It follows that u{b + eh) — > u{b) in 1) as e ^ 0 and that u® converges 

weakly in and thus strongly in W^’^(0,1). We denote this limit by 

r]{h). As a consequence of u® satisfying (8) and the limit behavior of u^, we 
obtain that rt® — > r]{b) as e ^ 0 strongly in as well. Thus, the limit 

r]{h) is the Gateaux derivative of the mapping b —>■ u(b) with increment h, i.e. 
r]{h) = Su{b)h, and it satisfies 

- (ar]{h)a:)x + br],^ih) + cr]{h) = -hu:c{b), in (0,1), (9) 

with boundary conditions 



r]{h){0) = T]{h){l) = 0 . 

The application h rj{h) is a bounded linear operator from W^’^(0,1) to 
W^’^(0, 1). That T] — > rj{h) is the Frechet differential of u at 6 with increment h 
can be verified as follows: let h G W^’^(0,1) with |^|rvi, 2 ( 0 , 1 ) < (5, and set 

^{ h ) := |/i|-\,2(o,i)(m(&+/i)-m(&)-??(/i))- 
We note that A{h) satisfies 

~{aAx{h))x + bAx{h) + cA{h) = — — {ux{b + h) - Ux{b)), 

l«lwi.2(o,l) 

A{h){0) = A{h){l) = 0. 

From |■u(g)|w 2 . 2 (o.l) < ^1/1^2(01) we have 

|Z\(/l)|iiA2,2(o,i) < C\Ux{b + h) — Ua;(f>)|L2(o,i). 

which implies that |Z\(/i)|w 2 , 2 (o,i) ^ 0, whenever |/i|wn 2 (o,i) ^ 0. Thus, the 
lemma is established. □ 

In a similar way one can prove that the second order Frechet derivative of 
b u{b) is the bilinear mapping denoted by f{b){h,h) := Slu{b){h,h), which 
satisfies equation 



{a^x{h, h))x + b^x{h, h) + c^{h, h) = -2hr]{h), 



( 10 ) 
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with boundary conditions 



ah,h){0)=ah,h){l) = 0. 

Then F{b) = u{b) = A{b)~^f and we have: 

F'{b)h = -A{b)-\hu,{b)), (11) 

F"{b){h, h) = 2A{b)-^[hA{b)-\hu^{b))]. (12) 

Moreover, the adjoint F'{b)* is given by 

F'{b)*h= -u^{b)A{b)-^h. 

Therefore, condition (in) of Theorem 1 takes the form 

bo — b* = F'{bo)*w = —Ua:{bo)A{bo)~^w, for some w S T^(0, 1). 

We note that such an element w exists if 

^^Gi72(0,l)ni?i(0,l) (13) 

'^x j 

and is given by 

w = A{bo) ^ Y (14) 

Ux{bo) 

Turning to condition (ii)' we shall show that a certain bound on 

b*-bo 
V '.= 

Uxibo) 

will imply that for a ^ S, r] = 0{5^) there exists p < 1 such that 

2(wj" F"{b*){ht\h^^^){l-t)dt] < p\\h^^\\h^o,iy ( 15 ) 

\ JO J L2(0,l) 

for all (5 > 0 sufficiently small. We make the notations: 

:= - bo and b* := bo + 

Since b plays the role of q, b^'^ is of course defined as in the previous section. 

The left-hand side of (15) will be denoted by E^'^. For the estimation of 
we shall use the following facts. By Theorem 1, and hence from 

are uniformly bounded in L^(0,1) for 5 > 0 sufficiently small. Since (see [1]), 
A{b)~^ : T^(0,1) ^ i?^(0,l) n i?o(0, 1) is uniformly bounded for b in bounded 
sets of T^(0, 1), this implies that there exists K such that 

lk(5‘)||ff2(o,i) < WMbP ^llL2(o,i).ff2(o,i)nffi(o,i)ll/lli^(o.i) - ^\\f\\L2(o,i), 



for all t G [0, 1]. 
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Let K := ||A(&o) ^llL 2 ( 04 )_/f 2 (Q In the following we use the esti- 
mates: 

ll/fflli2(o,i) < ll/llL2(04)||g||_L=o(04) and || 5 ||l=o(o 4 ) < ( 16 ) 

for all / G L^(0, 1) and g G ff^(0, 1) n Lfo(0, 1). From the expression of the first 
order Frechet derivative given by (11) we obtain that 

A(bT^g = A(bo)-^g - t f A{h^T\h^<i^{A{b'T^9)^) ds, (17) 

Jo 

for any g G L^{0, 1). By Theorem 1, b^'^ — > bo, for a ~ <5 and g = 0{S^), which 
we assume from now on. With 

K = ||24(&o)||_H-2(o_i)n_f/i(o.i),L2(o,i) 

and using (16), (17) together with u{b*) = A(6‘)“^/, we obtain 

< 4|kllL^(o,i) sup P(6o)24(&‘)-i[hi’"A(6‘)-i(hi-''rr,(&‘))]|U2(o,i) 

te[o,i] 

< 4||r;|U2(o,i) ( sup \\hf^^ A{bT\ht^u,,ib^))\\mo,i) 



+K sup \\tA{b*T^[ht^A{b*T^ (hfi^A{bTHht^u,{b*)))J\\H2ioA) 

t,sG[0.1] ^ 

< 4||r;|U2(o,r) ( ||hi’’'|U 2 (o,i)^ sup \\A{bY\h^^^u,,m\\H^o,i) 

\ 4v3 tG[o,i] 



+ K 



- ( KM^Wlho,!)' 



4^3 



|L2(o.l) 



< 4||r;|U2(o,r) 



ll^a’'llL=(0,l) 



-I- 



4v^ 
l!^a''IU2(o,i) 



^ll^a’'IU"(0.1) sup ||M^(fo*)||i2(o_i) 

iG[0,l] 






< 4||r;||L2(o,i) 



/ ^^4 ll/IU-(Ci) ^ ^ WfhHogA ..,||3 



192 V3 



192^3 



‘a llL2(0.1) 



K 



^^^\\da'^\\L^(0,l) ^ [^lhx(^o)||i“(0.1) + 

■ sup \\t{A{b*THhfiyA{b*T^f),))jH^^o,i) 

t,sG[0.1] 
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< 



K 



■^lkl|i2(o,l)||Ua;(&o)||L“(0,l)ll^a’*lli2(0,l) 






Therefore, 



< ^|klU^(0.i)h.(MllL<»(0.i)l|/i^’’’lli2(0,i) + C? {Wh^^Whio,!)) • 

so that condition (iz)' is satisfied for 5 > 0 sufficiently small, provided that 



K 

7! 



b*-bn 



c(^o) 



L2(o,l) 



|"«a;(&o)||L“( 0 ,l) < 1) 



or equivalently 



b*-bo 

Ux{bo) 



< 

L2(o,l) 



K\\ux{bo)\\L^{o,i) 



(18) 



The condition (18) can be interpreted in the following manner. The difference 
between b* and bo has to be sufficiently small not only globally, by the complete 
estimate but also locally, in the sense that the estimate q* has to be better where 
the expression \ux{bo)\ is small. 



Remark 1. The general result concerning the convergence rate of the estimate 
parameter q in the operator equation F(q) = f, remains also valid in the case 
when f is a monotone and hemicontinuous operator (see [5]). 
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Abstract. Finite element spaces are constructed that allow for different 
levels of refinement in different subdomains. In each subdomain the mesh 
is obtained by several steps of uniform refinement from an initial global 
coarse mesh. The approximation properties of the resulting discrete space 
are studied. 

Computationally feasible, bounded extension operators, from the inter- 
face into the subdomains, are constructed and used in the numerical 
experiments. These operators provide stable splitting of the composite 
(global) finite element space into local subdomain spaces (vanishing at 
the interior interfaces) and the “extended” interface finite element space. 
They also provide natural domain decomposition type preconditioners 
involving appropriate subdomain and interface preconditioners. 
Numerical experiments for 3-d elasticity illustrating the properties of the 
proposed discretization spaces and the algorithm for the solution of the 
respective linear system are also presented. 



1 Discretization 

Let f? C be a polyhedral domain and assume that it is subdivided into dis- 
joint tetrahedra forming an initial coarse triangulation Tq. Applying successively 
some refinement procedure to 7 q we obtain a sequence of nested quasiuniform 
triangulations Tq, Tj, 72, ... which have geometrically decreasing mesh param- 
eters hi> 

Next, let {f^i} be a non-overlapping decomposition of 17: 

17 = l7i , 17i n 17j = 0 for i ^ j . 

i=l 

We will assume that each subdomain 17^ is a coarse mesh domain, i.e., it is 
completely covered by elements from 7 q. 

We use Lagrangian finite elements of a given polynomial degree m > 1 over 
the triangulations 7j to define the approximation spaces Vj. 

L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 253—264, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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For each subdomain we choose a number of refinement levels h, for i = 
1, . . . , s. ^ 

Let {li, . . . , Ig} be the list of level numbers sorted in ascending order. 

We define the following auxiliary domains: 

Oi= fij, = U = yj ilj . for i = 1, . . . , s . 

Note that the original domain f2 can be divided into the following disjoint 
subsets n = 0,U0( = U O" U o(. 

We now introduce the spaces 

14 = G Vj., such that v\oi_i = o| , for i = 1, . . . , s , 

and then define the approximation space of our main interest by the sum 

Vh = Vi + V 2 + • • • + Vg . 



This space consists of continuous functions and it is a subspace of Inside 

each subdomain Vh\Qi consists of all the functions in Vl^\o■ whose trace 
on ^^2i belongs to a coarser space depending on the levels of the neighboring 
subdomains. In particular Vh\dOi Q VuldOi- 

Theorem 1. Let u G where m is the degree of Lagrangian elements 

we used to define the spaces Vj. Denote by hi the mesh parameter (diameter of 
tetrahedra) of the triangulation (which is the triangulation for all subdomains 

[2j such that Ij = k), for i = 1, . . . ,s. This implies that hi+\ < qhi for some 

fixed q G (0, 1). Define also the boundaries Gi = OiC\ Oi , i = 1, . . . , s — 1. 

The following estimate for the best approximation of u with functions from Vh 
holds: 



inf 

tih&Vh 



||m 



Uh\\l,0 < Ch(^\u\m+l,Of + 

i^l 



s-1 

+ Y^C{l + qnhT 

i=l 



inf 






The proof follows from standard arguments utilizing the approximation proper- 
ties of the local subspaces. 



2 Linear Elasticity 

In this section we use the discrete space defined in the previous section to dis- 
cretize a linear elasticity problem. The problem is posed as follows: let A,/x G 
Loo (17) be uniformly positive in 17 functions which are called Lame coefficients; 
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let also f G (L 2 (fi)Y be some given body force and g G (L 2 (^tv))^ be some 
given surface force on a part of the boundary /V C df?; the rest of the bound- 
ary Fu = dfi \ Ljv is assumed to have positive surface measure. The problem 
then reads: 

Find the displacement u G which satisfies: 

a(u, v) = ^(v) , Vv € : vjm = 0 

u|/^^ 0 : 

where 

a(u,v) = / 2A e(u) : e(v) -I- /r divu divv , 

Jn 

<P{v) =/fv-f/ gv. 

Here e(u) = is the linearized strain tensor which is defined by the 

equality: 

%(’^) = + diVj) , V = (ui,U2,f3) . 

It is well known that a(-, ■)^ defines a norm on the space V = {v e : 

v|r^ = 0} which is equivalent to the (iJ^(f2))^-norm; that is, 

a(v,v) ~ ||u 2 ||?_r 2 + Ikallpr?, Vv e V. (1) 

We discretize the problem by replacing the space V with its finite dimensional 
subspace n V, where V/j = (14)^. The discrete problem reads: 

Find u/i e V/i n V such that: 



a{uh,Vh) = ^{-Vh) , Vv?i€V/,nV. (2) 

Using the norm equivalence (1) it is easy to obtain an estimate for the error 
||u — u/i||i ^32 similar to the error estimate for the scalar case given in the previous 
section. 

3 Extension Mappings 

Our aim is to define an efficient parallel algorithm for solving the system of 
linear equations (2), which will be based on the given non-overlapping domain 
decomposition To handle the case of inexact subdomain solves (or pre- 

conditioners) we use the technique studied in [6] , [4] , which exploits computable 
extension mappings. 

The union of all boundaries of the subdomains call interface and 

denote by F : 

S 

F=\Jdf2,. 

2=1 
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Let E?i : Vft,|r V/j be an extension operator, that is: 

(E,v^)|^ = v^ Vv^eV.lz^. 

Using E^, we represent V/j as a direct sum: 

V;,=E;,(V^|^)0V^ 

where 

V°={vhGVn:vn\r = 0} ■ 

The space V° can also be represented as a direct sum of the following spaces: 

V° = {v/, € Vh : ^h\72\o, = 0 } , i = 1, . . . , s . 

In this way, V^, is decomposed into the direct sum: 

V/,=V°©V>---©V°0E;,(V;,|^) . 

It is obvious that V° and V° are orthogonal with respect to the inner product 
a(-, •) when i yf j. In general, this is not true for the spaces V° and E^ ( V/i|^). 
That is why, we impose the following boundedness condition on E^: 

a(E/,(v?,|p),E?i(v/,|^)) < ?7a(v/,,v/,) , Vv/^eV/^nV, (3) 

with constant t] > 1 independent of the discretization parameters Note 

that (3) is simply boundedness of E^ in energy norm. This condition is equivalent 
to the following strengthened Cauchy-Schwarz inequality: 

1 . 

a (E^vt v°) < (1 - ' « (E/.vt E;,v^) " a (v°, v°) ^ , (4) 

Vv^€ (V;,nV)|z^, Vv° €V^ 

We will consider vector extension mappings in which each of the scalar compo- 
nents in extended separately, that is E;j has the form: 

Ehvl^^) , > 

where E/j : I4,|r ^ 14 is a bounded scalar extension mapping: 

\\Euvlh,a<C\\vih r = C inf |4||i,r2, ^vl&Vu\r. 

2’ 

Using the last inequality and the norm equivalence (1), it is easy to prove that 
(3) holds with constant 77 independent of The extension mappings Eh are 

naturally defined subdomain by subdomain. We start with bounded extension 
mappings 

El, : Vi,\ao, ^ Vi,\a, . 
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which are defined on the uniformly refined space Vi^\qq^ and their image is also 
contained in an uniformly refined space - Vi- \ . Such operators are easily con- 

structed (as we will see later), and this is generally a well-established technique. 

To define the global extension operator Eh we need local extension operators 
from the space Vh\dQi acting into The definition of Vu implies that: 

Vh\dQi ^ and 14]^^ = {vh € ViilQi ■ f/tlan, G yh\dOi} 

and therefore 

K{Vh\dn.)cVh\a,. 

This fact allows us to define Eh in the following way: if G Vh\r then 
{^hvi)\^_= El{v\\on,) , i = 

One can estimate the norm of Eh, in a straightforward manner, in terms of the 
norm of the individual components 



4 Multilevel Extension Mappings 



In this section we briefly consider the definition of two types of multilevel exten- 
sion operators (cf., [2], [3], and [5]). 

For simplicity of notation, we will define an extension operator Eh from dfi 
into the whole domain 17 at some arbitrary refinement level 1: Eh : Vi\dn — > V). 

In this section we will use the notation V)? = Vk\dn- A general multilevel 
extension operator is defined as follows: let Vk : be linear operators 

(with ri = I and r_i = 0) and E^ : Vj! — > 14 be the trivial extension with zeros 
in the nodes of 7^ inside 17. The multilevel extension mapping Eh : Vi\do Vi 
based on the decomposition operators r/j is defined by the sum: 

i 

Eh = ^El{vh - Th-i) . 
fc =0 

It is known that if {rk\ satisfy the norm equivalence (where stands for the 
mesh size of Tjt) 

k=0 



then the corresponding multilevel extension operator Eh is uniformly bounded. 

We next define the two computationally feasible decomposition operators 
that we used in the numerical experiments: 

— let be the set of the nodal basis functions of the space and define the 
mappings qk : L 2 {dfl) Vj^, fc = 0, 1, . . . by the equality 



QkV 



E 



{y, (t>)o,as7 
(1, (t>)o,dn 



yv e L2{dn) . 
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If we take Vk = Qk then (5) holds and the corresponding extension operator 
is bounded. 

— to define the second example we introduce the discontinuous spaces 

T4'+ = {v& L^idn) : v\^ e VUT), VT G T^] , 

where Vm{T) stands for the set of all polynomials of degree < m over the 
triangle T and is the set of all triangles of the restricted to dfl trian- 
gulation 7fc. Note that is a proper subset of We define the projec- 
tions pk ■ > Vj! by averaging about the nodes x, 

{pkvl^+){x)= ^ \T\ ^ \T\ , 

T^x T^x 

where |T| is the measure (the area) of T. If we denote by qk^+ the L 2 {dfi)~ 
orthogonal projection on then we take = pkQk,+- It can be proven 
that {vk} are uniformly bounded (in || • ||o,ar? norm) projection operators 
and, as a corollary, that the norm equivalence (5) holds. 



5 Preconditioning 



In order to solve the discrete problem (2) we have to reformulate it into matrix- 
vector form by choosing a basis in the space n V. Let ^i, ^ 2 , • ■ ■ , and 
be bases respectively in the spaces 

V?, yo, ••• V°, and (V^nV)l^, 

then the set 

^ U ^2 U • • • U U 

is basis in the space V/j n V. In this basis the stiffness matrix has the following 
2x2 block structure: 



f Ao Aob \ } ^>1 U ^>2 U • • • U 
Ab ) } 



The inequality (4) is equivalent to the strengthened Cauchy inequality for A\ 

1 

1 ' 



Vq ^ObVf, <1 

V 



(yl AbbVb) ^ (v|^ Aoovo) ^ Vvf,, vq , 



and therefore A is spectrally equivalent to its block diagonal part. Moreover, 
if Mq and Mb are spectrally equivalent to Aq and Ab respectively then the block 
additive and block multiplicative preconditioners 



Ma = 



Mm 



( Mo ^ \ f ^ Mq 

Abo Mb y \ 0 I j 



Mq 0 
0 Mb 
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are also spectrally equivalent to A. 

The block Aq is easily preconditioned because it is block diagonal with blocks 
corresponding to the spaces V? (i = 1, . . . , s) which have multilevel structure 



V° 

z,0 



C c 



C Ku = 



where 

= {v € = 0 } . 

Therefore multilevel and multigrid methods can be used for the preconditioning 
of the blocks of Aq . In the numerical experiments we used V-cycle multigrid with 
one pre- and one post-smoothing iteration per level. 

The preconditioning of the block At, is a more complicated task. Without 
going into details we will give just an idea of the algorithm we used in the 
numerical experiments. Namely, we apply the idea of multigrid preconditioning 
of locally refined spaces considered in [1], but here we apply it to the interface 
space In the multigrid algorithm the following sequence of nested spaces 

is used: 

C V/i^lr C • • • C Yh,i\r = '^h\r , 

where the spaces V h,k = {yh,kY defined exactly as the space Yh with the 
only difference that the levels in the subdomains are replaced 

with the coarser levels {k^k = niin(/i, the last level I is chosen to be the 

smallest number for which Yh,i\r = Yh\r, that is for which 



r=\Jdn,. 

ii<i 



In the spaces Yh,k\r the following varying (non-inherited) symmetric, positive 
definite forms are used to define the multigrid algorithm: 

J e Yh,k\r, 

where the extension mappings Eh^k '■ Yh,k\r Y k,k are defined in a way similar 
to the way was defined. 

In the space Yh,k\r we smooth only in the region of F where Yh^k\r is finer 
than Yh,k-i\r- This region is the non-empty set F \ dOi. 

We finish this section with the remark that both the multiplication of A with 
a vector and the solution of a system with Ma (or Mm) can be carried out in 
parallel. Each subdomain corresponds to a processor that calculates the local ac- 
tions (of A or M^^). In addition communications between neighbor subdomains 
are required for the assembling of the global actions. 



6 Numerical Experiments 

We present numerical results for two linear elasticity problems in the unit cube 
17 = (0, 1)^. The fcth level triangulation Tk is obtained in the following way: first 
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Fig. 1. Cube partitioning into six tetrahedra 



we divide C into 2^ x 2^ x 2^' equal cubes and then each cube is partitioned 
into six tetrahedra as illustrated in Figure 1. With this triangulations we use 
quadratic Lagrangian finite elements to define the spaces Vfc, i. e. m = 2. 

Note that when quadratic FE are used in the 2-dimensional spaces df2i some 
of the nodal basis functions (j) have vanishing integral, i. e. (1, 4’)o^dOi = 0- 
Therefore the operators qk can not be defined in this case. Instead, we used 
qk+i defined for linear FE with the triangulation Tk+i, that is we used the 
decomposition operators qk+i where and are the operators 

defining the natural bijection between the space of linear FE over and the 
space of quadratic FE over 7^. Namely, these two spaces have the same set of 
nodes and this bijection simply replaces the two bases functions - the linear and 
the quadratic. This is illustrated in Figure 2. 





Fig. 2. Replacing piecewise linear function with quadratic function and vice 
versa 




The second decomposition operator we defined PkQk,+ can be defined for both 
linear and quadratic FE. In Table 1 we give the three different extension map- 
pings used in the numerical experiments. Comparing the results for if 2 and E3 
we can see the effect of the replacement of quadratic FE with linear. 



Table 1. Extensions used 





El 


E 2 


E 3 


rk 


r21 rl2 

-^k + l k 


2 2 

PkQk,+ 


7-21 1 7-12 

^k Pk+lQk+l,+ ^k 
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To solve the linear systems we used the preconditioned conjugate gradient 
(PCG) algorithm. The stopping criterion was 

r^M“^r < 10“^®r|^M“^ro 

where r is the current residual, I'o is the initial one, and M is the preconditioner 
used {Ma or Mm)- 

The Massage Passing Interface (MPI) was used for the parallel implementa- 
tion of the algorithm. 

Test problem 1. We take the following geometry and Lame coefficients: 

= (0,1)3, rN = {o<x,y<i,z = i}, rD = dn\rN, a = | y = ^ 
and the following components for the displacement: 
ui{x,y,z) = 0 

U 2 {x, y, z) = sin(7ra:) sin(7T2/) sin(Trz) 
usix, y, z) = {l- a:)x(l - y)y{l - z)z. 

We divide the domain into s = 2 = lxlx2, s = 4 = 2x2xl, s = 8 = 
2x2x2, and s = 16 = 2x2x4 subdomains. In each subdomain we take equal 
number of refinement levels li = I, i = 1, . . . , s. Thus the mesh is uniform in the 
whole domain. In Table 2 we give the number of iterations made by the PCG 
algorithm when the three different extension mappings were used. One can see 



Table 2. Iterations with E\, E 2 , and E^ and additive preconditioner 



s 


h 


s 


h 


s 


h 


1/4 


1/8 


1/16 


1/32 


1/4 


1/8 


1/16 


1/32 


1/4 


1/8 


1/16 


1/32 


2 


32 


35 


37 


38 


2 


44 


52 


56 


60 


2 


40 


47 


50 


52 


4 


29 


34 


37 


38 


4 


43 


55 


62 


66 


4 


42 


52 


57 


59 


8 


28 


33 


36 


37 


8 


44 


53 


61 


65 


8 


41 


51 


56 


59 


16 


- 


43 


43 


44 


16 


- 


68 


70 


71 


16 


- 


55 


58 


61 



that the number of iterations increases when the mesh parameter h decreases 
and when s increases, but there is a tendency for stabilizing. Notice the slight 
jump of iterations when s is increased from 8 to 16. This is due to the change of 
the initial level — when s = 16, Tg has 6 x 4 x 4 x 4 tetrahedra, while for s = 2, 4, 
and 8, Tg has 6 x 2 x 2 x 2 tetrahedra. When we compare the extensions, we 
see that the one based on qk (Ei) is better than the other two. The comparison 
of E 2 and E 3 shows that the transition from quadratic FE to linear improves the 
number of iterations slightly. In Table 3, the number of iterations with Ei and 
multiplicative preconditioner {Mm) are given. Comparing these numbers with 
those from the additive {Ma) version, we see that Mm is almost two times better 
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Table 3. Iterations with E\ and multiplicative preconditioner 





h 


s 


1/4 


1/8 


1/16 


1/32 


2 


18 


18 


19 


20 


4 


17 


18 


19 


20 


8 


16 


18 


19 


20 


16 


- 


26 


24 


24 



than Ma, but Mm requires the solution of two systems with Mo (preconditioners 
inside the subdomains) while with Ma requires just one. 

Test problem 2. For this test we choose 

/? = ((), 1)3 ro = aa a = | /i = §. 

5 5 

We take the exact solution as a sum of two functions — one smooth and one 
rough which has support in (0, ^)3 (see Figure 3): 

ui{x,y,z)=^x, y,z) <P{x,y,z] = 5.10^ 4>{x)<f>{y),l>{z) 

■U2{3:,y,z) =4>{x,y,z) J «e(0,^), 

u^{x,y,z) =4>{x,y,z) + {I -x)x{l -y)yz ^ "^(b, 5 )- 

The fiist decomposition we consider with this test problem has s = 8 = 2x2x2 




Fig. 3. Graphic of U 3 (x, y, |) 



subdomains. In this way the rough component of the solution is contained in 
f2i = (0, ^)3. We take two different levels of refinement /j = /, for f = 2, . . . , 8 
and a finer level /j for Qi. In Table 4 are given the discrete energy norms of the 
error and the respective number of iterations for some spaces with refinement in 
We see that for fixerl b (the mesh size outside l?i) when the mesh inside S2i is 
refined the error decrea.ses. At some level the error inside and the error outside 
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Table 4. Discrete energy norm of the error (xlO and number of iterations 
with El and multiplicative preconditioner 





hi 


h 


1/4 


1/8 


1/16 


1/32 


1/4 


12 


13 


13 


13 


1/8 




13 


13 


13 


1/16 






14 


14 


1/32 








14 







h 


1 




h 


1/4 


1/8 


1/16 


1/32 


1/4 


8.1067 


2.6031 


0.9491 


0.8320 


1/8 




2.6151 


0.4737 


0.1554 


1/16 






0.4885 


0.0698 


1/32 








0.0747 



it are balanced (for example h = 1/4, and h\ = 1/16) and more refinement in 

does not improve the approximation (compare h\ = 1/16 and hi = 1/32 for 
h = 1/4). This behavior is in agreement with the error estimate presented above. 
The number of the iterations made by the PCG algorithm is again independent 
of the mesh sizes, which is natural because we used multigrid algorithms for the 
preconditioning . 

One disadvantage of the discretizations with refinement in l?i (i. e. when 
li > I or hi < h) is that the number of the unknowns in l7i is approximately 
times larger than those in the other subdomains. Therefore the processor corre- 
sponding to 12 1 has to do much more computations than the rest because the 
number of computations increases linearly with the number of the unknowns. To 
avoid this unbalanced discretization we divide 12 1 into 2x2x2 equal subdomains. 
The remaining 7 subdomains remain the same. Thus we obtain a balanced dis- 
cretization for the case when the mesh size inside (0, ^)^ is two times smaller than 
that outside of it. Note that the discrete space V/ does not change. In Table 5 
the number of the iterations with two balanced discretizations are given. For the 



Table 5. Iterations with balanced discretizations (with Ma and Ei) 





hi 


h 


1/4 


1/8 


1/16 


1/32 


1/4 


21 


35 


- 


- 


1/8 




24 


29 


- 


1/16 






26 


30 


1/32 








27 



case hi = h the discretization is unchanged (s = 8) and for the case hi = ^h 
we subdivide (0, 1)^ into 2x2x2 subdomains (i. e., s = 15). We see that the 
balancing procedure we applied does not deteriorate the convergence rate of the 
PCG algorithm. 



264 Veselin Dobrev and Panayot Vassilevski 



References 

1. Bramble, J.: Multigrid methods. Pitman Research Notes in Mathematics v. 294, 
Longman Scientific & Technical (1993). 259 

2. J. H. Bramble, J. E. Pasciak and P. S. Vassilevski, ^^Computational scales of Sobolev 
norms with application to preconditioning" , Math. Comp. 69 (2000), 463-480. 257 

3. V. Dobrev and P. S. Vassilevski, “Non-mortar finite elements for elliptic prob- 
lems" , Proceedings of the Fourth Intern. Conference on Numerical Methods and 
Applications (NMA’98), ’’Recent Advances in Numerical Methods and Applica- 
tions” (O. Iliev, M. Kaschiev, S. Margenov, Bl. Sendov and P. S. Vassilevski, eds.). 
World Scientific, Singapore, 1999, pp. 756-765. 257 

4. G. Haase, U. Langer, A. Meyer, and S. V. Nepomnyaschikh, Hierarchical extension 
operators and local multigrid methods in domain decomposition preconditioners, 
East-West J. Numer. Math. 2(1994), 173-193. 255 

5. S. V. Nepomnyaschikh, Optimal multilevel extension operators. Report SPC 95-3, 
Jan, 1995, Technische Universitat Chemnitz-Zwickau, Germany. 257 

6. P. S. Vassilevski and O. Axelsson, “A two-level stabilizing framework for interface 
domain decomposition preconditioners" , in: Proceedings of the Third International 
Conference 0{h^), Sofia, Bulgaria, August 21-August 26, Sofia, Bulgaria, “Ad- 
vances in Numerical Methods and Applications”, (I. T. Dimov, Bl. Sendov 
and P. S. Vassilevski, eds.). World Scientihc, Singapore, New Jersey, London, Hong 
Kong, 1994, pp. 196-202. 255 



Singularly Perturbed Parabolic Problems on 
Non-rectangular Domains* 
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Ekaterinburg, Russia 



Abstract. A singularly perturbed time-dependent convection-diffusion 
problem is examined on non-rectangular domains. The nature of the 
boundary and interior layers that arise depends on the geometry of the 
domains. For problems with different types of layers, various numerical 
methods are constructed to resolve the layers in the solutions and the 
numerical solutions are shown to converge independently of the singular 
perturbation parameter. 



1 Introduction 



We consider the following class of singularly perturbed parabolic problems 

(Pe) Lsu{x,t) = {euxx + aux-but- du){x,t) = f{x,t) on D, (la) 

u{x,t) = g{x,t) on D\D, (lb) 

a > a, b> P > 0, d> S > 0 (Ic) 

where D = , 4>2{t)) x (0,T] is a non-rectangular domain bounded by the 

curves x = , x = 4 > 2 {t) such that 

(?ii(0) = 0, (> 2 ( 0 ) = 1, 4>i{t) < 4>2{t), Vt, 



and 0 < e < 1 is the perturbation parameter. We also assume that the data 
a, 6, d, /, g and pi, p 2 are sufficiently smooth, and /, g satisfy sufficient compat- 
ibility conditions at the corners of the domain. 

In order to generate numerical approximations to the solutions of problems 
in Pg, the problem is transformed to one on a rectangular domain. This is 
achieved by introducing the new co-ordinate system (x, t) and the change of 
variables 



X = x{x, t) 



X - pi{t) 

p2{t) - 



i = t. 



(2) 



* This research was supported in part by the National Centre for Plasma Science and 
Technology Ireland, by the Enterprise Ireland grant SC-98-612 and by the Russian 
Foundation for Basic Research under grant No. 98-01-00362. 
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The transformed class of problems is then 






(Pg) Lsu{x, i) = {euxx + dux - bu{ - du){x, i) = f{x, i) 


on D, 


(3a) 


u{x,t) = g{x,i) 


on D \ D 


(3b) 


where 







i)=i7x(0,T], /2 = (0,1), u{x,t) = u{x{x,t),t), g{x,t) = g{x{x,t),t), 
a = a{x,t){ 4>2 - <l)i) - b{x,t){(j)'^{x - 4 > 2 ) - 4>2i^ ~ 4>i)), 
h=b{x,t){(j) 2 - d = d{x,t){(j) 2 - (j)iY, f = f{x,t){(j )2 - 
(j)i = 4>i{t), 0'=(/)'(i), x = x{x,t)=x{(j) 2 -(l)i) + (l)i. 

Notice that irrespective of 4>i and (f> 2 , b > 0 and d > 0. However, in general, the 
sign of d may differ from a at certain points of the domain. Thus the sign of a, 
which is crucial in selecting a suitable numerical method for Pg, depends on the 
shape of the original domain and the original coefficient functions a and b. 

2 Straight Line Walls 

As the expression for d is quite complicated in the general case, we assume that 
the functions (pi and (p 2 are linear. That is, assume that 

pi{t) = —mit, p 2 {t) = l — m 2 t. (4) 

The resulting problem class, P^ , is thus 

Pe C Pe (5) 

where 



a = (1 — (m 2 — mi)t){a{x, t) — b{x, t){x{m 2 — mi) + mi)), 

b = b{x, t)(l — (m 2 — mi)t)^, d = d{x, t){l — {m 2 — mi)t)^, 

/ = fix, t){l - {m 2 - mi)t)^, g = g{x, t), 

x = x{x, t) = x(l — {m 2 — mi)t) — mit. 

We now deal with two special cases of the above problem class. 

2.1 Parallel Straight Line Walls 

The first special case we consider is when the side walls are parallel, i.e mi = 

m 2 = m and that the coefficient functions a and b are constant. That is 



a{x,t) = a, b{x,t) = (3, {x,t) € D. 



( 6 ) 
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The problem class, Pf, is thus 

Pf C (7) 

where 

a = a — mj3, h = j3, d = d{x,t), f = f{x,t), g = g{x,t), x = x — mt. 



Depending on the values of a and j3, Pf falls naturally into one of three distinct 
problem classes: 



pp c pp u po U P - 


(8) 


where 




Pp = {Pe\ d{x,i)>t), \/{x,t)€D} 


, (9a) 


P° = {Pe\ d{x,i) = 0, 'i{x,i)GD} 


(9b) 


Pp = {Ps\ d{x,i)<0, y{x,t)GD} 


(9c) 


For problems from the first and third classes, Pp and P“, the solution possesses 
a regular boundary layer, in a neighbourhood of a; = 0 in the former and in a 
neighbourhood of x = 1 in the latter. In the second case, P°, the solution has 
parabolic boundary layers in a neighbourhood of both x = 0 and x = 1. 

Clearly 


Pp C Pp if a > m/3, 
Pp C P° if a = m/3, 
Pp C P“ if a < m/3. 





2.2 Non-parallel Straight Line Walls 

The next special case we consider is when both (f>i and <p 2 are still straight lines, 
but are now no longer parallel. The former is sloped as before but the latter will 
be positioned vertically, i.e., mi = m,m 2 = 0 and we also assume that m > 0. 
As before we assume that a and b are constant. 

The problem class, Pp , is thus 

pp c P^ (10) 

where 

d = {1 + mt){a + Pm{x — 1)), b = (3{1 + mt)"^ , d = d{x,t){l + mt)"^ , 
f = f{x,t){l + mtp , g = g{x,t), x = x{l + mt) — mt. 

Again depending on the values of a and (3, Pp will fall into a particular problem 
class. We can identify three types of problem subclasses 

pi+ y pio y pi- ^ pi 



( 11 ) 
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where 

P^+ = {PY\a>ml3}, (12a) 

4'° = {P/| a = m/3}, (12b) 

P^~ = { P/ I a < m/3, a > 0}. (12c) 

For problems from the class P^~ , the solution exhibits no boundary layer (due 
to the compatibility conditions), while for problems from the two classes P^° 
and Pg we have a boundary layer in a neighbourhood of x = 0 (more precisely, 
a parabolic layer in the former). 

Note that we have 



PY c p+ u p^ u Pi u Pa U Pa 

where, for ( G (0, 1) and 7 > 0, we define 



H = 


{Pel 


Al 


7X, V(x,t) 






II 


{Pel 


a(x, t) < 


-7(1 - x). 


V(x,t) gD} 










r < 0 X 


< C, yi G [0, 


T] 






1 a{x,t) 


< = 0 X 


= C, Vte[o, 


T] 




1 




[ > 0 X 


> C; yi G [0, 


T] 


that Pg 


'+ c 


p+ pio 
^ 6 ^ e 


c P' P'“ 


cPp 





In the next section we construct numerical methods that resolve the layers 
that arise in each of the six problem classes, P+, Pg+, P^, P^^, PJ~ and P“, 
encountered in this section. 



3 Numerical Methods 

We now construct appropriate numerical methods for generating approximate 
solutions to problems from each class. Note however that any problem from 
Pg“ can be transformed into an equivalent problem in P+, using the change of 
variables x = 1 — x. Therefore we need only be concerned with the numerical 
solution of problems from classes P+, P^, PJ:^ and P^~ ■ 

Before we introduce the numerical methods we need some criteria to decide 
whether a given method is adequate for the problem in question. We would 
ideally like globally-defined, pointwise-accurate, e-uniform monotone numerical 
methods. For a discussion of these concepts see Farrell et al. [1]. 

To generate numerical solutions for problems from all the above classes, we 
construct a numerical method consisting of a standard finite difference operator 
and a piecewise uniform fitted mesh. The only exception to this is in the case of 
class PJ~ where we use a uniform mesh. 

First of all we consider class P+ ( all considerations are similar for P^^). We 
use the following piecewise uniform mesh in the x-direction. Divide Q into two 
subintervals 

J ^ “ " / S O rp 
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where 17; = (0, cr), fir = (o’, 1) and the fitting factor cr is chosen to be 



a = min 



1 

2 ’ 



e 

a 



IniV 



where N is the number of mesh elements in the x-direction and a is the lower 
bound on a. We construct our piecewise uniform mesh on 17 by placing 
a uniform mesh in the subintervals f2i,flr using N /2 mesh elements in each 
subinterval. A uniform mesh 17^ with M mesh elements is used on (0,T). We 
then define the fitted piecewise uniform mesh to be 



jjN,M = X 17. 



M 



The resulting numerical method is thus 

(P+’^) = eSlU^ + aD+U^ -bD^U^ - dU^ = f on 

-^N,M 

U^ = u on 

Theorem 1. For problems from class Pf~ , which are sufficiently compatible at 
the corners, the numerical approximations generated by the numerical method 
defined by ^ are e-uniform and satisfies the following error estimate 

sup \\U-u\\^N.M < CN-\lnNf + CM~^ 

0<e<l 



where C is a constant independent of N, M and e. 



Proof. See Shishkin [3]. 

To numerically solve problems from the classes and P^*°, we use the same fi- 
nite difference operators but the fitted mesh used is different. First of all consider 
class P°. In this case the interval 17 is divided into three subintervals 

17 = 17; U 17c U 17^ 

where 17; = (0 ,(t), 17c = (cr, 1 — cr), 17^ = (1 — cr, 1) and the fitting factor cr is 
chosen to be 

CT = min|i, 2v^lnA^|. 

The fitted piecewise uniform mesh is then defined as in the previous case. The 
resulting numerical method is denoted by P^’^ ■ 

Theorem 2. For problems from class , which are sufficiently compatible at 
the corners, the numerical approximations generated by the numerical method 
defined by P^ ’ ^ is e-uniform and satisfies the following error estimate 

sup \\U -u\\^N,M <C{N~^\nNf + CM~^ 

0<e<l 
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where C is a constant independent of N, M and e. 

Proof. See, for example, Miller et al. [2]. 

For problems from class we use a similar numerical method as that used for 
problems from class P+, but with the fitting factor chosen to be 

cr = min - , 2y/e In N 

We denote the resulting numerical method by P^^’^ . 

As noted above for problems from class P^~ it suffices to use a uniform mesh, 
and the standard finite difference operator. This is due to the fact that the layer 
that arises is a weak interior layer, in the sense that the solution in the layer 
region does not possess extremely large gradients, as would be the case with the 
other types of layers considered in this paper. Denote this method by P^~'^ . 

In the next section we demonstrate numerically that the methods introduced 
for the latter two cases are e-uniform for problems from the appropriate classes. 

4 Numerical Results 

As a particular example of a problem from class P^ we let (fi and 4>2 be chosen 
as in §2.2. Take T = 1 and m = 1 and let the original problem be 



euxx + Ux - ut - u = -X - 1, 


on (-t, 1) X (0,1], 


(14a) 


u(x, 0) = 1 — x^, 


X € (0, 1), 


(14b) 


u{—t,t) = l, u(l,t) = 0, 


t > 0. 


(14c) 



It is clear that we have a = mfi and thus the transformed problem will be in 
class Pg°. Here we have a parabolic boundary layer at a; = 0. 

As a particular example of a problem from class Pj‘~ , we again let (f>i and 
(p 2 be chosen as in §2.2. Take T = 1 and m = 2 and let the original problem be 



euxx + Ux - ut - u = -X - 1, 


on (—2t, 1) X (0, 1], 


(15a) 


u(x, 0) = 1 — x^, 


X G (0,1), 


(15b) 


u(—2t,t') = l, u(l,t) = 0, 


t > 0. 


(15c) 



Here a < mfi and thus the transformed problem will be in the class Pj~ ■ In 
this case we have no layer in the main term of an asymptotic expansion (only a 
weak layer arises due to the compatibility condition being not of a sufficiently 
high order). 

We take N = M and tabulate the computed errors , and the computed er- 
uniform errors , for a variety of values of e and N, for both problems using the 
methods described in §3 (see Tables 1 and 2). In both cases we use the numerical 
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Table 1. 



Table of computed errors using method for problem (14) 









Number of 


[ntervals 


N 




e 




8 


16 


32 


64 


128 


256 


1. 


0 


2.09C-02 


1.22C-02 


7.12C-03 


3.69C-03 


1.79C-03 


7.85e-04 


2“ 


-1 


2.98e-02 


1.55e-02 


7.81e-03 


3.82e-03 


1.79e-03 


7.69e-04 


2“ 


-2 


4.68e-02 


2.47e-02 


1.25e-02 


6.13e-03 


2.88e-03 


1.24e-03 


2“ 


-3 


6.19e-02 


3.30e-02 


1.68e-02 


8.31e-03 


3.91e-03 


1.68e-03 


2‘ 


-4 


7.32e-02 


3.96e-02 


2,04e-02 


l.Ole-02 


4.76e-03 


2.05e-03 


2“ 


-5 


8.09e-02 


4.42e-02 


2.29e-02 


1.14e-02 


5.39e-03 


2.33e-03 


2“ 


-6 


8.54e-02 


4.69e-02 


2.45e-02 


1.22e-02 


5.76e-03 


2.49e-03 


2“ 


-7 


l.OOe-01 


4.92e-02 


2.53e-02 


1.26e-02 


5.94e-03 


2.56e-03 


2“ 


-8 


1.06e-01 


5.88e-02 


2.82e-02 


1.28e-02 


6.03e-03 


2.60e-03 


2“ 


-9 


1.14e-01 


6.63e-02 


3.31e-02 


1.55e-02 


6.81e-03 


2.67e-03 


2“ 


10 


1.21e-01 


7.04e-02 


3.65e-02 


1.76e-02 


8.05e-03 


3.35e-03 


2^ 


11 


1.25e-01 


7.25e-02 


3.86e-02 


1.90e-02 


8.82e-03 


3.72e-03 


2~ 


12 


1.29e-01 


7.38e-02 


4.03e-02 


2.00e-02 


9.36e-03 


3.98e-03 


2“ 


13 


1.31e-01 


7.57e-02 


4.13e-02 


2.07e-02 


9.74e-03 


4.17e-03 


2“ 


14 


1.32e-01 


7.70e-02 


4.20e-02 


2.12e-02 


l.OOe-02 


4.30e-03 


2“ 


15 


1.33e-01 


7.79e-02 


4.25e-02 


2.15e-02 


1.02e-02 


4.39e-03 


2“ 


16 


1.34e-01 


7.85e-02 


4.29e-02 


2.17e-02 


1.03e-02 


4.46e-03 


2" 


17 


1.34e-01 


7.89e-02 


4.31e-02 


2.19e-02 


1.04e-02 


4.50e-03 


2“ 


32 


1.35e-01 


8.00e-02 


4.36e-02 


2.23e-02 


1.06e-02 


4.61e-03 


E 


7 T ~ 


1.35e-01 


8.00e-02 


4.36e-02 


2.23e-02 


1.06e-02 


4.61e-03 



Table 2. Table of computed errors using method Pj for problem (15) 









Number of 


[ntervals 


N 




e 




8 


16 


32 


64 


128 


256 


2“ 




4.78e-02 


2.51e-02 


1.27e-02 


6.19e-03 


2.91e-03 


1.25e-03 


2‘ 


-1 


6.67e-02 


3.65e-02 


1.84e-02 


8.99e-03 


4.21e-03 


1.81e-03 


2“ 


-2 


8.93e-02 


5.06e-02 


2.57e-02 


1.27e-02 


5.95e-03 


2.56e-03 


2~ 


-3 


1.17e-01 


6.57e-02 


3.43e-02 


1.70e-02 


8.02e-03 


3.46e-03 


2“ 


-4 


1.40e-01 


7.94e-02 


4.24e-02 


2.11e-02 


l.OOe-02 


4.34e-03 


2“ 


-5 


1.55e-01 


8.98e-02 


4.84e-02 


2.43e-02 


1.16e-02 


5.00e-03 


2“ 


-6 


1.64e-01 


9.64e-02 


5.19e-02 


2.61e-02 


1.24e-02 


5.38e-03 


2“ 


-7 


1.69e-01 


l.OOe-01 


5.37e-02 


2.70e-02 


1.29e-02 


5.57e-03 


2“ 


-8 


1.71e-01 


1.02e-01 


5.46e-02 


2.75e-02 


1.31e-02 


5.66e-03 


2“ 


-9 


1.73e-01 


1.03e-01 


5.50e-02 


2.77e-02 


1.32e-02 


5.70e-03 


2“ 


10 


1.73e-01 


1.03e-01 


5.53e-02 


2.78e-02 


1.32e-02 


5.73e-03 


2~ 


11 


1.74e-01 


1.03e-01 


5.54e-02 


2.79e-02 


1.33e-02 


5.74e-03 


2“ 


12 


1.74e-01 


1.03e-01 


5.54e-02 


2.79e-02 


1.33e-02 


5.75e-03 


2“ 


13 


1.74e-01 


1.04e-01 


5.55e-02 


2.79e-02 


1.33e-02 


5.75e-03 


2“ 


32 


1.74e-01 


1.04e-01 


5.55e-02 


2.79e-02 


1.33e-02 


5.75e-03 


\m 


■I 






5.55e-02 


2.79e-02 
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Fig. 1. Numerical solutions generated by (a) Pg°’^and(b) Pj; with iV=64, 
e=2~^^ for problems (14) and (15) 

solution on the finest mesh available, namely N =1024, as the approximation to 
the exact solution. The computed pointwise errors, and , are defined as 



In both of these tables we see the maximum errors decrease as N increases for 
each value of e and that the e-uniform errors, E^ , also decrease with increas- 
ing N. This demonstrates numerically that these methods are e-uniform for the 
problem classes in question. In Figure 1 we plot the numerical solution of these 
problems for particular values of e and N. 
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Abstract. Badly conditioned operator problems in Hilbert spaces are 
characterized by very large condition numbers. For special types of such 
problems, their reduction to ones with strongly saddle operators leads 
to remarkable improvement of correctness and to justification of the fa- 
mous Bakhvalov — Kolmogorov principle about asymptotically optimal 
algorithms. 

The first goal of the present paper is to present a short review of recently 
obtained results for stationary problems in classical Sobolev and more 
general energy spaces. The second goal is a study of the approach indi- 
cated above to the case of nonstationary problems; special attention is 
paid to parabolic problems with large jumps in coefficients; the study is 
based on relatively new extension theorems and special energy methods. 



1 Introduction 

1.1 Normally Invertible Operators 

Only real Hilbert space s and bounded operators are considered in this paper; 
the normed linear space of linear bounded operators mapping a space U into a 
space F is denoted by C{U\F)] C{H) = C{H;F[)- Ker A = {v : Av = 0 } = 
the kernel (null-space) of the operator A; Im A = the image (range) of the 
operator A; I = the identity operator; H* = the linear space of bounded linear 
functionals / mapping iJ into R; A* = the adjoint operator to A e C{Hi; H2); 
for A G C{H), [A] = 2 ~^{A + A*))-, £+(R) denotes the set of linear, symmetric, 
and positive definite operators in C{H)-, H{B) = the Hilbert space differing 
from F[ only by inner product defined by R S C~^{F[), namely {u,v)h{b) = 
{u,v)b = {Bu,v)h = {Bu,v). 

Operators ^2,1 G C{F[i;H2) with Im ^2,1 = H2 are called normally invert- 
ible; they correspond to a particular case of normally solvable operators which 
are defined as operators such that Im ^2,1 is a subspace in i?2 (operators with 
closed images); if ^ 2,1 is a normally solvable operator, then iJi is an orthogonal 
sum of Ker ^ 2,1 and Im A2 i, i.e., 

Hi = Ker ^2,1 0 Im Ai,2; Ai,2 = A* (I.l) 

L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 273—284, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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Note that the indicated operators are fundamental in theory of Fredholm’s equa- 
tions (see [1-3] and references therein). 

A normally invertible operator ^ 2,1 yields a one-to-one mapping of the Hilbert 
space Im Ai ^2 (orthogonal complement in Hi to Ker ^ 2 , 1 ) onto H 2 and, by the 
Banach theorem, this mapping is invertible and the corresponding inverse (the 
right inverse) A 2 is such that 



11^2,1^ ^^11 = CT ^ < 00. 

We note that the well-known inf-sup condition 

(A 2 ,iUi, ^2)^/2 



inf 



sup 



> cr > 0 



is often used instead of (1.1), (1-2); (1.3) can be written in the form 



( 1 . 2 ) 

(1.3) 



11^2 i'«2||ffi > cr||-«2||ff2) ^■*^2 G H 2 (1.4) 

(see [3], [4] and references therein) and implies that ^ 2 , 1^2 ^^ € £+(iJ 2 )- 



1.2 Strongly Saddle Operators and Their Generalizations 

In the Hilbert space H = Hi x H 2 , we consider Aa G C{H) of the form 

A = ^14 ^1.2 
“ [ 4 I 2 .I - 0 ^ 2 , 2 ] ’ 

where Aij G C{Hj; Hi), a > 0, A 2 P is a normally invertible operator; 



(1.5) 



[4ll,l] € C^{Hi), Ai^2 — ^ 2 . 1 ! [ 24 - 2 , 2 ] > 0. 

Under above conditions, Aa is called a generalized strongly saddle operator. For 
such operators, it was proved (see [3], Theorem 7.1.3) that Aa is invertible and 



\\A-^<K, (1.6) 

where the constant K can be chosen uniformly for all a > 0 and all ^ 2,2 with 
[^ 2 , 2 ] > 0. This implies that problem 

AaU=f, (1.7) 

is correctly posed. Moreover, (1.6) implies that the condition number uniAa) = 
II^IqIIIIA”^]] X 1 if a e [0, oo] and that optimal perturbation estimates (see 
[3,5]) hold; they follow from (1.6) and the known inequality 1]|A“^ — < 

|lA“^jj X jjAo, — Ajj X first such results were obtained in [5] (see also [3]) 

for A 2.1 associated with the divergence operator. 

Note that Aa in (1.5) is a strongly saddle operator if [Ajy] = Ai^i, i G [1,2]; 
if additionally ^ 2,2 G >C+(iJ 2 ), a > 0 and /2 = 0 then the first component of 
the solution of (1.7) coincides with the solution of problem 

AaUi = Ai lUi -I Ai 2 A 2 2^12, lUl = fi, 

rv ’ 



( 1 . 8 ) 
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which might serve as a typical example of variational problems involving a large 
parameter 1/a; the condition number is very large {hhAAo) ^ 1/a); 

hence it complicates construction of good numerical methods and algorithms 
very strongly. 



1.3 Regularization of Certain Badly Conditioned Operator 
Problems 

Operator problems of type (1.8) with large parameters l/a>>l(a^ +0) in 
Sobolev and more general energy spaces H\ can be found in many important 
branches of mathematical physics; the corresponding variational problem 

Ml = arg min [/ 2 (mi) - 2l(wi)], / 2 (wi) = wi) + -|| A 2 ,iWi|R_i (1.9) 

ViGHi a "^ 2,2 

can be connected with an application of the standard penalty method for a 
problem with the linear constraint A 2 ^iV\ = 0; we stress that the classical La- 
grange approach (the Lagrange multiplier method) to the variational problem 
with this constraint yields good conditioned problem (1.7) with a = 0, /2 = 0 
(the additional function U2 plays the role of the Lagrangian multiplier). 

Thanks to an understanding of the role of (1.7) and its grid analogs in the 
theory of projective-grid (finite element) methods and iterative processes, it now 
seems reasonable to regard problems (1.7) in the Hilbert space H = Hi x H2 as 
basic. If 

fc* 

772 = 772,fc; ^2,lVl = , cl2,l,fc*Mi] (1-10) 

k=l 

(^ 2 ,i.fe*'yi € 772, fc) then it is possible even to deal with variational problems with 
several large l/a^, k G [1,A:*]; for example, from (1.9), (1.10) with 

k* 

l2{vi) = {Ai^lVi,Vi) + ^ —\\A2^l,kVl\\H^ ^ (1.11) 

k=l 

we can pass to problems (1.7) with the block aH 2,2 in (1.5) replaced by the 
block-diagonal operator 

0^2,2 = diag(ai72,i, . . .,ak*l2,k*), (1-12) 

where ak > 0, l2,k — the identity operator in 772, fe, k € [l,fc*]. 

The indicated remarkable improvement of correctness leads sometimes even 
to the construction of asymptotically optimal numerical methods and algorithms 
under natural conditions on the smoothness of the solution (to justification of 
the famous Bakhvalov — Kolmogorov principle about asymptotically optimal al- 
gorithms; see [3]). We recall that projective methods for problems with sad- 
dle operators make use of a special sequence of finite-dimensional subspaces 
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Hh = H^,h X H 2 ^h G H approximating the original Hilbert space H {Hr is 
approximated by the sequence Hr,h = Hr, r = 1,2); it is required that 



inf sup 

U2&H2 



{A2^1Ui,U2)H2 

\\ui\\hi\\u2\\h2 



> CTo > 0, 



(1.13) 



where cto is independent of h; (1.13) implies that ||^ 2 .i^^ll < ctq ^ < 00 (^ 2,1 is 
an approximation of ^ 2 , 1 )- 



1.4 Improved Correctness of Problems with Strongly Saddle 
Operators and Their Generalizations 

Problems (1.8)-(1.12) can be sometimes reduced to those of type (1.7) in a better 
Hilbert space G = Gi x G 2 C H; such a reduction was indicated in [7] and is 
based on the following lemma (see [2,7]): 

Lemma 1.1. Let A 2.1 G C{Hi',H 2 ) he normally invertible and the embedding 
operator of a Hilbert space G 2 into H 2 be bounded. Let Gi be a subset of Hi 
such that dim G\ = 00 and ||u||q^ = \\v\W^ + ||H 2 pw||g 2 < 00 . Then G\ is a 
Hilbert space and the restriction H 2 ,i,Gi G £(Gi;G 2 ) of A to Gi is normally 
invertible . 

We recall that a pre-Hilbert space H is called Hilbert space if it is complete 
and separable and dimiJ = 00 ; if dimi7 < 00 , the term Euclidean space is 
usually preferred. 



2 Examples of Normally Invertible Operators and 
Regnlarized Problems 

2.1 The Divergence Operator; Elasticity and Hydrodynamics 
Problems 

In what follows, we assume, for simplicity, that 12 is a bounded domain in the 
Euclidean space R'*, d = 2, 3, with Lipschitz piecewise smooth boundary T = 
dQ and 17 = 12 U T. We write 



{u,v\a = {u,v)l^(C2), = {u,u)l[l, \u\ya = {u,u)\[l = (|VMp,l)offi 

and make use of the Sobolev space W^iH) = iJ^(12) (see [3,4,6]) with the norm 

\\u\\HHo)^\\u\\yn^[\u\la+\u\laV/^. ( 2 . 1 ) 

For vector fields, the norms are defined in the same manner. Examples of 
problems from hydrodynamics and elasticity associated with the divergence op- 
erator 2 I 2.1 for the corresponding vector fields can be found in [3,4,7] for various 
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choices of boundary conditions; not only the Stokes system but many its gener- 
alizations were considered from the point of view of optimization of numerical 
methods and algorithms; special attention was paid to estimates of accuracy and 
computational work independent of parameters like a (see [3,7]). 

2.2 The Trace Operator; Problems in Strengthened Sobolev Spaces; 

New Penalty Methods for the Dirichlet Conditions 

The strengthened Sobolev spaces are naturally connected, e.g., with such im- 
portant (two or three-dimensional) problems of mathematical physics as those 
in theory of plates and shells with stiffeners or in the hydrodynamics involv- 
ing the surface tension (see [3, 7, 8, 9]). If d = 2, the model strengthened Sobolev 
space Gi^ra = G(17; S') = G, m = [m] > 1, is defined as a subset of functions 
in (see (1.1)) such that their traces on each Sr belong to H^{Sr) = 

W^{Sr), SO we can define the norm in G by 

r* 

Ikllc = 

r— 1 

S C G consists of straight line segments (stiffeners) Si, , Sr* (smooth arcs are 
also allowed). It was shown (see [9]) that Gi^ is a completion of the space of 
smooth functions in the sense of norm (2.2). 

These nonstandard Hilbert spaces allow to set correct variational and op- 
erator problems. Among possible spectral (eigenvalue) problems, we mention 
those that are reduced to the problems Mu = XLu with L € T+(G) and sym- 
metric and compact operators M; for such problems in our Hilbert space s G, 
the classical Hilbert-Schmidt theorem holds (see [1,3]). Examples of problems 
on more involved composed manifolds of different dimensionality can be found 
in [8,9]. Special attention was paid to numerical methods based on the use of 
projective-grid methods and effective iterative methods such as multigrid and 
cutting methods; in case where the original problems were badly conditioned, the 
indicated above reduction to problems (1.7) in the Hilbert space G = Gi x G 2 
turned out to be very efficient (see [3,8,9]). 

As is known, the homogeneous Dirichlet conditions can be understood in 
terms of the penalty method as a limit of natural boundary conditions of the 
type + (1 + 1 /q;)mq = 0, where n is the unit vector of the outer normal to 
the boundary F, a -1-0; the latter ones are connected with the additional term 
(penalty term) F(u) = (1-|- I/q;)]^]^ p in the minimized energy functional. From 
the mathematical point of view, this penalty term might be considered as rather 
weak because of additional requirements on smoothness of the solutions in order 
to obtain optimal perturbation estimates ]]«« — m||i,j 7 + \ua — ujo,_r = 0(a). 
It was shown recently (see [9]) that such and even stronger estimates hold 
under correctness of the original problem if apply the stronger penalty term 
F{u) = (1 -I- l/a)||M]l^„^^j with u > 1/2 and treat the arising problem in the 
corresponding strengthened Sobolev space. Moreover, this approach leads to im- 
portant a posteriori estimates with no additional assumptions on the solution in 
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contrast to estimates based on respective estimates of residuals; similar results 
hold for spectral problems (see [9,10]). The results obtained yield also under- 
standing of mechanism of splitting of the problem under consideration (with 
large penalty parameters on S) into separate ones in subdomains with the ho- 
mogeneous Dirichlet conditions. The case with = 1 is especially important 
since efficient numerical methods were indicated. 



2.3 The Jump Operator; Problems in Weakened Sobolev Spaces 

In what follows, me make use of a partition 

^ = uiliA (2.3) 

into blocks with Lipschitz and piecewise smooth boundaries df2i = Fi. The 
factored Sobolev space (associated with (2.3)) is 

i* 

i=l 

The boundary Fi consists of blocks Fij = Fi D Fj; they constitute R = 
For w € Hj, we define the local jump Jijw of traces on Fij as 

Jijw = Tr m{Oj)^L 2 {rij)W - Tr m(Oi)^L 2 {rij)W. 

If we assume, for simplicity of presentation, that different Tjj = Rr are sep- 
arated, then the weakened Sobolev space Ai^i = R) (see [9-11]) 

consists of ic G F[j such that the jumps of the traces of w on each Fij belong 
to i/i(A,); 



^ ^ (2-4) 

i—1 i<.j 

Ai^i is a strengthened F[j; is a subspace in which explains the term 

used above. It was proved (see [9-11]) that A\^i is a completion in norm (2.4) of 
the space of discontinuous functions such that their restrictions to each Qi are 
continuous and smooth functions; this is the case for more general spaces. 

Weakened Sobolev spaces and related F[j are well suited for mathematical 
modelling of problems in composite structures where discontinuous solutions are 
allowed (problems with interfaces). Spaces of the relevant type are used in do- 
main decomposition methods, especially in the case of nonmatching grids. Our 
attention to the above spaces was motivated by the fact that problems in weak- 
ened Sobolev spaces have good perspectives from the point of view of obtaining a 
posteriori error estimates of solutions to classical elliptic boundary and spectral 
problems under no additional requirements of the solution smoothness. Effective 
numerical methods for solving elliptic problems in weakened Sobolev spaces were 
considered in [9-11]. 
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2.4 The Restriction Operator; Elliptic Problems with Large Jumps 
in Coefficients 

Iterative methods of various nature for the discretized elliptic problems, men- 
tioned above, have been considered in many papers (see [12-14] and references 
therein); probably, the first effective iterations were indicated in [15]. We con- 
centrate here on estimates of accuracy of projective methods and construction 
of asymptotically optimal algorithms with estimates independent of the jumps 
in coefficients (such methods and algorithms are referred sometimes as robust). 

Below, we use a sufficiently simple part Fq C F with l/b|(d-i) > 0, where 
I l(ci-i) denotes the {d — l)-dimensional measure (the case Fq = F is allowed); 
we define Hi = H^{Q;Fo) as a standard subspace in H^{H) (it consists of 
functions v with zero traces on Fq). We also use an open set C C H such that 

C = utiCk, (2.5) 

where each Ck is a domain with Lipschitz and piecewise smooth boundary dCk', 
Ck consists of blocks A {Ck = UigTr(fc) A); the distance between different Ck is 
greater then 2p' > 0. We take p G (0, p')) and define Ck,p as a set of points whose 
distances to Ck are smaller then p; we define Hi^k,p as a subspace of Fq) 

consisting of functions with supports in Ck,p (the functions vanish at the points 
whose distances to Ck are greater then p); C in (2.5) will be associated with 
certain large coefficients l/ak > 0, fc G [l,fc*] and the Hilbert space 



k* 

H2=1[ Vk, (2.6) 

fc=i 

where each Vk is a special subspace in H^{Ck) specified below. For each k, we 
introduce Wk = H^{Ck', dCk H Fq). If \dCk H /oj(d-i) > 0 then Vk = Wk and we 
write k G ttq, {dCk n Fq) = Fkp, {dCk n F) \ Fk^ = Fkp. If \dCk n Fo](d_i) = 0 
then fc G 7Ti and Wk = H^{Ck)] Vk is now defined as a set of functions in Wk 
such that their traces on a piece "fk C (7/c of a smooth {d— l)-dimensional surface 
are orthogonal to 1 in the sense of L 2 {"fk) (the orthogonality condition is written 
in the form (pk{v) = 0); here l 7 fc](£j_i) > 0 and the case jk C dCk is allowed. The 
norm in each Vk is chosen as (see (2.1), (2.6)) \\v2,k\\vk = \v2,k\i,Ck ^ \\u2,k\\i,Ck'^ 
elements of H 2 are written as V 2 = [u 2 ,i, ■ . ■ ,r’ 2 ,fc*]- The key Hilbert space Hi 
is a subspace of Hi characterized by the conditions 

Pk{v) =0, yk e m; (2.7) 

the restriction operators of elements vi G Hi onto Ck and C are denoted by Rk 
and R respectively. 

Theorem 2.1. For all k G tti, suppose that the distance between Fq and each 
Ck with fc G 7Ti is greater then 2p > 0 (p < p'). For all k G ttq, suppose that the 
distance between Fkp and Fq \ Fk^ is greater then 2p and that the extension 




280 Eugene G. D’yakonov 



theorem of Vk to Hi,k,p holds (see [14]). Then there exists a constant K* > 0 
such that, for each V 2 G H 2 , it is possible to indicate a function u\ G Hi with 
properties: 

k* 

Rui = V2, WuiWh. < K*[Y^ ( 2 . 8 ) 

Proof Under the above assumptions, it suffices to construct an extension G 
Hi.k,p of each V 2 ,k G Vk to H\^k,p separately (the desired extension u\ can 
be taken as uip + • • • + ui^k*)- For each k G tt\, the classical extension the- 
orems (see [6,3]) yield a function W 2 ,k G iJ^(R‘*); its product with a smooth 
function gk{x) (it vanishes on f? \ Ck,p and equals to 1 on Ck) gives u\^k- For 
fc G 7To, instead of the classical extension theorems, more involved ones should be 
used (see [14]); they apply harmonic equation in Ck,p \ Ck with specially chosen 
Dirichlet conditions. 

Note that Theorem 2.1 implies that the restriction operators Rk and R are 
normally invertible. It is important that grid extension theorems can also be 
obtained; they deal, e.g., with piecewise linear functions i) 2 ,k and ui defined on 
triangulations Th(Ck) and Th(fl) of Ck and 17 respectively; K* (see (2.8)) in 
these grid theorems does not depend on h (see [3,14] and (1.13)); domains with 
non-Lipschitz boundaries are allowed (see [3,14]). 

As an example of elliptic problems with large jumps in coefficients we consider 
the problem of finding u G Hi such that 

i* 

b{u] u') = '^cf\u/ f2i-,u' / Qi)i^n^ + 

i^l 

k * 

+ V — V c^\u/np,u'/H,)i,n, = l(u'), W G Hi, (2.9) 

— oik — 

k=i leTi-(fc) 

where I G H], all and ak are positive constants, but ak are relatively 

small. Correctness of problem (2.9) is obvious (see (1.11)). We can reduce it to 
problem (1.7), (1.12) if, instead of (2.6), we define H 2 and as 



k* 

772 = n 



(2.10) 



11^2,^111,,,,^ E (2.11) 

It is important that the norms in 772, fc and Vk are equivalent; the same holds for 
the old and new norms in H 2 ■ This enables us to consider the restriction operator 
R G U(77i; H 2 ) as normally invertible and deal with basic problem (1.7), (1.12), 
(2.10), (2.11). It seems natural to assume that 

*e[l,U], 



( 2 . 12 ) 
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where u — the solution of (2.9) and 7 G (0, 1]. In the same manner as for Hi (see 
[3]) it can be verified that inequalities (2.12) determine a compact set in Hi; for 
the iV(e)-width in the sense of Kolmogorov for this set, we have iV(e x , 
where £ > 0 is a prescribed tolerance and N{e) corresponds to the dimension of 
the used approximating subspace of Hi. 

Under conditions (2.12), asymptotically optimal projective-grid methods for 
(1.7), (1.12) can be constructed on the base of quasiuniform triangulation s of 
blocks A and spline subspaces Hr,h of dimensions Nr = 0{N{e)), r = 1, 2 (see 
[3,14]); we assume that Th{H) are consistent with geometry of blocks A and 
Fq so the subspaces Hi^h C Hi and Hi^h C Hi yield subspaces Wk,h C Wk, 
H 2 .k,h C 772, fc, H 2 ,h C 772. Estimates of accuracy jjw — u\\h < KN for such 
methods are independent of all € [0, cq]. 

Our projective-grid method yields grid systems 



^1,1 ^1,2 




Ul 




■fl' 


_A2,1 — ^2,2 




U2 




0 



(2.13) 



in the standard Euclidean space s H = Hi x H2, dim Hi = dim77i, dimH2 = 
dim772. In (2.13), ^2,2 = diag(ai7l2,i, . . . , afc*2l2,fc*)) ^2,fc is a corresponding 
analog of the identity operator in 772, ^ {^ 2 ,k is a Gram matrix), k G [l,fc*]- 
If Th{H) = T<^p^D) is obtained as a result of a refinement procedure that is 
applied recursively p times for an initial coarse triangulation r*-°^(f?) with p x 
|ln/i|, then, for Ai^i in (2.12), there exists an asymptotically optimal model 
operator 7?i x Ai^ such that the constants of spectral equivalence and the 
estimates of the required computational work in solving systems with 7?i are 
independent of all numbers at' it is constructed in accordance with theory of 
model cooperative operators (see [3,9]) based on proper multigrid splittings of 
the spline space Hi^h (hierarchical basis for it is used). The same applies to 
model operators B 2 for the block-diagonal operator yl2 = diag(yl2,i, . . . , T2,fc»); 
there is only a relatively new problem connected with the use of the basis in 
772, fc if fc G 7Ti (see (2.7)). But it can be reduced (see [3], Section 8.3) to a similar 
standard problem with the natural basis in Wk and the corresponding Gram 
matrix (it is nonnegative). Hence, we can construct an asymptotically optimal 
model operator B G £+(H) such that Bu = [7?iUi, 7?2U2], Vu G H, and apply 
effective iterations 



Hu”+i = Bu”-r„H*B-i(Hu”-f). (2.14) 

A combination of (2.14) with the multigrid continuation procedure leads to jus- 
tification of the Bakhvalov — Kolmogorov principle with estimates of computa- 
tional work independent of all ak G [0,cq] (see [3,14]). Instead of (2.14), the 
modified conjugate gradient iterations can also be used. 

It should be noted that problem (2.9) in Hi is rather unusual if the set tti is 
nonempty and m > 1 conditions (2.7) are necessary (we take tti as a set of first m 
indexes). But Hi = Hi + lin[ei, . . . , Cm], where basic functions ei, . . . , Cm can be 
easily constructed; moreover, we can assume that Pk(ej) = 5k, j- This enables us 
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to reduce the original operator problem in to m + 2 problems vci H\ in the 
same manner as it is done in the well-known block elimination procedure (see 
[3], Section 1.5). 

3 Parabolic Problems with Large Jumps in Coefficients 

3.1 Discretized Parabolic Problems with Large Jumps in 
Coefficients 

For nonstationary problems in Qt = 12 x [0, T], we take t = T /n* and write = 

nr, u” = Ur{tn) (n = 0, . . . , n*), dou" = [u” — (n = 1, . . . , fc < n*). 

Hereafter, Hi, H 2 , H = Hi x H 2 are Hilbert space s or Euclidean space s. 
On the basis of (2.9) and as a typical example of parabolic problems and their 
discretizions, we consider a sequence of stationary problems in Hi 

(9o<; Oo.r? + 6(w"; <) = (F^; <)o.r2 (3.1) 

where w" refers to arbitrary elements of Hi, F" G L 2 {fi), n > 1, Ui = 0 (this 
can be assumed without loss of generality). We stress that (3.1) is a problem in- 
volving very large parameters l/ofc which makes standard accuracy estimates of 
numerical methodss (see [16]) rather unsatisfactory. To make them independent 
of Ofc, we apply the same regularization as for (2.9): 

Mao< + Hi,i< + Hi,2U^ = M5”, (3.2) 

- A2,2U^ = A”, (3.3) 

where M G C{Hi), M = M* > 0, {Mui,vi)hi = (ui, wi)o.j7, HuiUm = 
\ui\o,a, Vui G Hi, Vui G Hi; \\Mui\\hi < ll^ll Iwijo.r?; /° = 0- 

Theorem 3.1. There exist constants tq and K, independent of all aj G [0,cq] 
and such that, for the solution of (3.2), (3.3), with t <tq and n = 1, - ■ ■ ,k < n* , 
the a priori estimate 

k 

ll^lllr, +rY^[\\do<rM + \K\\k + \W2\\% + II^2.i5o<|||,J < KF^, (3.4) 

n—1 

holds, where 

k 

F,^T^{\\gnlr + \\domk)- (3-5) 

n—1 

Proof Restriction (3.3) implies that 

— {A2pBoUi + A 2 ^ 2 doU 2 , U2) H2 = —{diif2,vdP)H2- 



(3.6) 
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The inner product of each part of (3.2) and doUi in Hi yields an equality; 
summing up the obtained equality and (3.6), we obtain 

k 

X = T ^ {WdoUiWlj + {Ai^iUi,BoUi)hi + {A2,2U2,doU2)H2) = 

n—1 



k 

= rJ2{{9i,do<)Mo - (5o/2”,0/t.) = (3.7) 

n—1 

It can be easily verified that 

k 

X>tY^ Po^fMo + (gr,5o<)M < + MWm (3.8) 

n—1 

For the second term on the right-hand side of (3.7), we have 

k k 

Z^-rJ2idof^,u^)H2; \Z\<rJ2\\domH2\\u^\\H2- (3.9) 

n—1 n=l 

It is important that 1 1^2 1 1^2 can be estimated from above as 

\\u^\\h2 < K' (||M/n|^, + WMBhu-Wh, + (3.10) 

(see (3.2) and fundamental inequalities (1.4), (1.13)). Hence, 

k 

IZI <Krf^ (pao<||i, + (1 + 2M\\Bomk + \\<\\k) (3-11) 

n—1 

with a K > 0 and arbitrary 12 > 0. Combination of (3.8), (3.9), (3.11) yields the 
desired estimate for Y in (3.7) and an unequality of standard type, which, for 
small enough 12 > 0 and tq, leads to 

k k 

TY.\\<\\h<KoFk, \\u>[rH^+rY,\\dou-i\\li<KoFk. 

n=l n—1 

These estimates yield (3.4), (3.5) since we can apply (3.10) and \\A 2 ^iBoUi\\h 2 < 

II ^2,2^2 + Bof2\\H2 (all aj < Co). 
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Proper Weak Regular Splitting for M-Matrices 



1st van Farago* 

ELTE University, Dept. Applied Analysis 
H-1053 Budapest, Hungary 



Abstract. The iterative solution of the system of linear algebraic equa- 
tions Ax = b with a nonsingular M-matrix A is considered. A one-step 
iterative method is constructed which is based on the special weak regular 
splitting of the matrix A. We prove that the obtained iterative method 
is not only convergent but it has also some further advantageous prop- 
erties: the maximal rate of convergence, the efficiency from the point 
of view of computational costs and the qualitative adequacy. We also 
examine the relation between this splitting and the regular splittings. 
Finally we construct two-sided monotone sequences to the solution of 
the above system. These sequences are produced by the iteration based 
on the weak regular splitting of A, with different suitable starting vec- 
tors. The method of the possible determination of these vectors are also 
indicated. 



1 Introduction 

The solution of the system of linear algebraic equations 

Ax = b (1) 

with the given regular matrix A € and the nonnegative vector b € M” 

is a basic problem of the numerical methods. A considerable part of the appli- 
cations results in such a system where A is an M-matrix, that is a matrix with 
nonpositive offdiagonal elements and there exists a positive vector fpos such that 
the vector Afpos is also positive. 

In order to solve this problem usually we construct the one step iteration of 
the form 



jr(j + l) _ -b g, j = 0, 1, . . . (2) 

Here the question is the construction of the iterative matrix T and the vector g. 
The usual approach [1,9] is their determination through the splitting of the 
matrix A 



A = M — N, M is regular (3) 

* This research was supported by the Hungarian National Research Funds OTKA 
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by the formulas 



T = M-^N, g = M-^b. (4) 

Clearly, for convergent iterative matrices the iteration (2) is convergent to the 
solution of the equation (1). 

However, in addition to the convergence we have to impose some additional 
requirements to the iteration. Our aim is to choose a splitting such that the 
iteration (2) (4) satisfies some further expectations. Namely, 

1. Maximal rate of convergence. It means that the spectral radius p(T) is as 
small as possible. 

2. Efficiency. The iterative method (2) (4) is called efficient if M”^ is easily 
obtainable and the computational cost of the splitting (3) is low. 

3. Qualitative adequacy. The iterated vectors for all fixed j preserve the 
main qualitative properties of the solution vector x. 

Assume that the nonsingular matrix A and the convergent matrix T are any 
fixed matrices. Clearly, the matrices defined by 

M = A(I-T)"\ N = M-A, 

where I denotes the unit matrix, form a splitting of A and T = M”^N [7]. So, 
if e: G (0, 1) is arbitrary and T = el, then the splitting has the form 

M=^A, N=^A, (5) 

1 — e 1 — e 

and the spectral radius of iteration matrix in the splitting based on (5) is equal 
to e, that is arbitrarily small . Therefore this splitting satisfies the first require- 
ment. 

With respect to the second requirement, a splitting is called efficient if M has 
a suitable form and its computation is not too difficult, for instance, in the case 
when M is triangular. In [8] it is proved that provided the LU-decomposition of 
the matrix A exists the splitting of the form 

M = LD^\ N = M-A (6) 

satisfies both the first and second requirements, too. 

Under the assumptions made the solution of the system (1) is nonnegative. 
Therefore our aim is to preserve this property during the whole iteration, that 
is, after stopping the iteration after any step, the approximate solution has to 
preserve the nonnegativity. In order to guarantee the third requirement, we are 
able to analyse some other basic qualitative properties of the iteration process 
[3,4,5]. As the results show in case of preservation of the nonnegativity of the 
initial vector the basic qualitative properties are also preserved during the iter- 
ative process. Therefore our aim is to construct a weak regular splitting of the 
matrix A, that is a splitting of the form (3) with a monotone matrix M and a 



Proper Weak Regular Splitting for M-Matrices 287 



nonnegative matrix M“^N. Obviously, if M is monotone and N is nonnegative, 
that is (3) defines a regular splitting, then it is a weak regular splitting, too. 

In the sequel the decomposition (3) is called a proper splitting if the corre- 
sponding iteration is convergent and satisfies all the above three requirements. 

Usually these requirements raise objections in choosing of the splitting be- 
cause they result in inconsistent conditions. As one can easily see the splitting 

(5) has a maximal rate of convergence and on the class of the monotone matri- 
ces it is a weak regular splitting. However, it is not efficient: the computation 
of M”^ is equivalent to the inversion of the matrix A. Therefore, even on the 
monotone matrices it is not a proper splitting. On the other hand, the splitting 

(6) has a fast convergence and it is efficient, but the third requirement is not 
satisfied for any matrices A. 

Therefore the construction of a proper splitting is a complex task. Since for 
the M-matrices the iterations defined by weak regular splittings are convergent 
[1,10] therefore in the following we restrict our consideration to such kind of 
splittings. 

The paper is organised as follows. In Section 2 we prove that the splitting 
(6) is a proper weak regular splitting on the M-matrices. We also show that this 
splitting is not a regular splitting of the matrix A. In Section 3 we construct two- 
sided monotone sequences to the solution of (1). These sequences are produced 
by the iteration (2) (4) based on the weak regular splitting of A, with different 
suitable starting vectors. We also present a possible method of choosing the 
starting vectors. 

2 An Efficient Weak Regular Splitting of M-Mat rices 

In the following we show that for any regular M-matrix A € there exists 

an efficient weak regular splitting . 

First we formulate a statement, the proof of which follows immediately from 
the proof of the statement Eis of Theorem 2.3 of Chapter 6 in [2]. 

Lemma 1. If A is an M-matrix then there exists an LU-deeomposition 

A = LU. (7) 

with regular triangular M-matrices L and U. 

Assume that Ai , A 2 , . . . A„ are any different fixed numbers on the interval 
(0,1) and we introduce the notations 

di = >0, D = diag{di, d 2 , ■ ■ ■ dn), A = diag{\i, A 2 , . . . A„). (8) 

Then the following theorem holds. 

Theorem 1. The matrices 

M = LD"\ N = M-A (9) 

define a weak regular splitting of the matrix A . 
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Proof. Clearly (9) defines a splitting of A. We prove that it is a weak regular 
splitting. Since L is an M-matrix therefore = DL“^ is the product of 

two nonnegative matrices, that is M is monotone. Clearly, N = L(D^^ — U) 
therefore we have 

M”iN = DL"iL(D^i-U) =I-DU. (10) 

The matrix DU is an upper triangular M-matrix with the diagonal elements 
1 — Aj, which proves the statement. 

In the following we show the possibility of choosing a weak regular splitting 
with an arbitrarily small spectral radius of the iteration matrix M~^N. 

Theorem 2. Assume that e is an arbitrarily small positive number and the fixed 
different numbers \ satisfy the conditions 

Ai < e for all i = 1,2, .. . n. (11) 

Then for the weak regular splitting of the form (9) the relation /9(M~^N) < e 
holds. 



Proof. As we have proved the diagonal elements of the matrix DU are 1 — 
Xi. Apparently these numbers are the eigenvalues of the matrix and they are 
different. Therefore the matrix is diagonalizable in the form 

DU = S(I-A)S”^ (12) 

where S is a regular matrix [6]. Using the relations (10) and (12) we get 

M-iN = I-S(I- A)S”^ = SAS”\ (13) 

therefore the relation p(M“^N) = p{A) = maxAi < £ holds. 

Corollary 1. Since in the splitting (9) the matrix M is triangular, therefore, 
as a consequence of Theorems 2. and 3., this splitting satisfies our requirements, 
that is it defines a proper splitting. 

In the following we examine the sign-pattern of the matrix N in the weak 
regular splitting (9) for the symmetric positive definite M-matrices A. In this 
case there exists the Cholesky factorization, that is, 

A = LL"^ (14) 



with a regular lower triangular M-matrix L. Here the elements of L are defined 
by the formulas 






h,i — ^ ^ h,ti ^i,j — 



ai. 



St=l 



i = 1, 



j = i + l. 



r.(15) 



t=i 



Obviously U^i > 0. With arbitrary numbers Xi from (0,1) we define the notations 
as before: 

di = — — ^ > 0, T) = diag{di,d 2 ,. ■ .dn), A = dmg(Ai, A 2 , . . . A„). (16) 
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Theorem 3. For the symmetric positive definite M-matrix A the splitting (9) 
with (15) and (16) defines a weak regular but not a regular splitting. 

Proof. The weak regularity follows from Theorem 2. Therefore it is sufficient to 
prove that the condition N > 0 cannot be satisfied. Using (15) for the offdiagonal 
elements of the matrix N we have 



Since A and L are M-matrices therefore the right side of (17) is nonpositive. 
Moreover, there exists a negative offdiagonal element of A which proves our 
statement. 

We remark that by analogical computation for the diagonal elements of the 
matrix N we obtain 

— 

This relation shows that the sign of the diagonal elements depends on the choice 
of the numbers A^. However, if these numbers are chosen sufficiently small then 
the diagonal elements of N are negative. 

3 Two-Sided Iterations 

In this section we show the possibility of the construction of two-sided iterations 
to the solution of (1). 

As before we assume that A is an M-matrix and (3) is a weak regular splitting. 
Then the iteration (2)(4) is convergent to the solution of (1) for any initial vector 
x(o). We show that a suitable choice of the initial vector results in monotonically 
convergent vector sequences to the solution from both directions. 

Theorem 4. Assume that the vectors and satisfy the conditions 

Av(o) < b and Aw^°^ > b, respectively. Then for the vector sequences 
and the following statements are true: 

1. They are convergent to the solution of (1), that is to the vector x. 

2. The vector sequence monotonically increases, that is 

3. The vector sequence monotonically decreases, that is 

4 . They form a two-sided bound for the solution, that is for all k G IN the 
relation 

< X < (19) 






— 






^2 



(18) 






i-l 



-^ 3 ^ 



ddli 









(17) 



holds. 



290 



Istvan Farago 



Proof. The first statement is already proved. On the base of the iteration the 
relation +b holds. On the other hand, due to the assumption we 

have < +b. Using the monotonicity of the matrix M these relations 

imply the relation For any k the proof is based on the induction: 

if then, using the form of the iteration (2) and the nonnegativity 

of the matrix T, the relation = T > 0 holds. This 

proves the second statement. The third statement is proved in a similar manner. 
The last statement is an obvious consequence of the first three statements. 

An important question is the possibility of choosing the suitable vectors 
and Using the fact that A is an M-matrix we can give a method 

to their determination. Clearly, the diagonal matrix diagA is a nonsingular, 
nonnegative matrix. Let us denote by v the nonnegative solution of the easily 
solvable equation 



diagAv = b. (20) 

Since the offdiagonal elements of A are nonpositive therefore the relation 0 < 
{diagA — A)v = b — Av holds, that is b > Av. So, the choice = v is 
suitable. Using the notations 



i„ = [i,i,...,i]^elR” 



= maxoiy > 0, bmin = min6i, 71 = 



we can observe that the vector v = 71 1„ also satisfies the condition b > Av, 
that is we can choose it as the initial vector 

In order to choose the suitable vector we use the positive vector fpos 
defined in the definition of the M-matrices. If we introduce the notations 

7 If • / * \ bfjiax 

Umax = maxbi, fmin = mm(Afpos)i, 72 = 

^ ^ Jmin 



then for the vector w = 72 fpos the relation 



(Aw)j — ^2{A^pos)i bmax ^ bj 

holds, that is Aw > b. Therefore the choice = w is suitable. We remark 
that for the diagonally dominant M-matrices the vector fpos can be chosen by 
fpos = Irt- Therefore, in this case the choice = 72ln is suitable with 72 = 
bmax/ Smim where Smin > 0 denotes the minimum of the row-summs of the 
matrix A. 

Finally we remark that the two-sided bound (19) can be successfully applied 
to construct a stopping criterion of the iteration. 
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Abstract. The error generated by the classical upwind finite difference 
method on a uniform mesh, when applied to a class of singularly per- 
turbed model ordinary differential equations with a singularly perturbed 
Neumann boundary condition, tends to infinity as the singular perturba- 
tion parameter tends to zero. Note that the exact solution is uniformly 
bounded with respect to the perturbation parameter. For the same classi- 
cal finite difference operator on an appropriate piecewise-uniform mesh, 
it is shown that the numerical solutions converge, uniformly with respect 
to the perturbation parameter, to the exact solution of any problem from 
this class. 

1 Introduction 

Consider the following class of linear one dimensional convection-diffusion prob- 
lems 



Note that a Neumann boundary condition has been specified at x = 0. We recall 
the comparison principle for this problem (see [2], for example). 

Theorem 1. Assume that v S (7^(17). Then, if t'^(O) < 0, f(l) > 0 and 
Lgv{x) < 0 for all x S 17, it follows that v(x) > 0 for all x G f2. 
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LgUs = eu'f + a(x)u'^ = f{x), x G 17 = (0, 1), 
£<(0) = A, Me(l) = B, 
a, f G (7^(17), o(x) > a > 0, x G 17. 



(la) 

(lb) 

(l c) 
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From this we can easily establish the following stability bound on the solution 

K(x)\ < |«,(1)| + i|<(0)|e-““/^ -h i|l/||(l - ^). 
a a 



Lemma 1. [2] The derivatives of the solution of (1) satisfy the hounds 

llu^ll < C'e"'=max{||/||, ||ue||}, k = 1,2 
||w?^|| < C'e-3max{||/||, ll/'ll, ||ue||} 



where C depends only on ||a|| and ||a'||. 

Consider the following decomposition of the solution itg 

Us = Ve + We, Vs = Vo + evi + e^f 2 ; = tco + £Wi (2a) 

where the components vq , and V 2 are the solutions of the problems 

av'o = f, vo{l)=Us(l), (2b) 

av[ = -v'f, t;i(l) = 0, (2c) 

LsV2 = -v'{, ez;'(0) = 0, W2(1) = 0 (2d) 

and the components wq,wi are the solutions of 

LsWo = 0, ewo(O) = eUs(O), wo(l) = 0. (2e) 

LsWi = 0, sw[(0) = -v's(O), wi(l) = 0. (2f) 



Thus the components Vs and Ws are the solutions of the problems 

LeVs = /, t;' (0) = vo(0) + ev[(0), Vs(l) = Us(l) 

LsWs = 0, w' (0) = m'^(O) - t;'(0), u;£(l) = 0. 

Also, they satisfy the bounds given in the following lemma. 
Lemma 2. The components Vs,Ws and their derivatives satisfy 

||^;«|| <C(1 + £2-'=), fc = 0,l,2,3 

< C(eK(0)| -he)e-'^'e-“^/", fc = 0,l,2,3. 

Proof. Use Lemma 1 and the fact that 

e-ifsix) = f where Aft) = f a{s)ds 

Jx Jo 

is the exact solution of 



Lsf^s = 0, eV’e(O) = -1, ^e(l) = 0. 
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2 Upwinding on a Uniform Mesh 

In this section we examine the convergence behaviour of standard upwinding on 
a uniform mesh. The Neumann boundary condition is discretized by the scaled 
discrete derivative eD^Ui;{Q). 

if Ue = eS'^Ue + a{xi)D^Ue = f{xi), Xi € , (3a) 

eZ?+U(0) = £<(0), U(l)=w,(l), (3b) 

N 

where 17 is an arbitrary mesh. We now state a discrete comparison principle. 

Theorem 2. [2] Let he the upwind finite difference operator defined in (3) 
and let be an arbitrary mesh of N+1 mesh points. IfV is any mesh function 
defined on this mesh such that 

D+V(xo) < 0, V{xn) > 0 and L^V < 0 in 17'^, 

TV 

then V{xi) > 0 for all Xi G . 

Hence, on an arbitrary mesh, the discrete solution f/g satisfies the bound 

\Ufixi)\ < |ug(l)| +e|'Ug(0)|^>j + -||/||(1 - Xi). 

a 

where is the solution of the constant coefficient problem 

+ aD+<Li = 0 , eD+<Lo = - 1 , <Pn = 0 . 



Theorem 3. Let Ug he the continuous solution of (1) and let f/g be the numerical 
solution generated from the upwind finite difference scheme (3) on a uniform 
mesh . Then, 

(a) if |Mg(0)| < C, we have 

\\U,-u,\\j^<CN-^ 

where is the linear interpolant of C/g and C is a constant independent of N 
and e. Also, 

(b) j/£|itg(0)| = C 7 ^ 0 , then for any fixed N , 

||t7g|| ^ oo as £ ^ 0 

Proof, (a) Consider first the case of |We(0)| < C. The discrete solution C/g can 
be decomposed into the sum 



C/g = U + We 

where U and ITg are respectively the solutions of the problems 

Lf Cg = f(x,), X, G eD+Ve(0) = £<(0), U(l) = i^e(l) 

L^We = 0, X,G £D+We(0) = £< (0), Wg(l) = 0. 
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We estimate the errors and Wg — Wg separately. By standard local trun- 

cation error estimates, we obtain 



\Lg(Vg - Vg){xi)\ < -(xj+i - Xi-i) 



,(3)| 



l{Xi) 



{Xi+I - Xi) 



.( 2)1 



< CN~ 



Note also that 

\D+{Vg - ^e)(0)| = K(0) - 79+^;,(0)| = j\s - ds\ < CN~\ 

With the two functions tp'^{xi) = CN~^{l—Xi)±{Vg — Vg){xi), and the discrete 
minimum principle for we easily derive 

\{Vg-Vg){x,)\<CN-\ 



Note that if |We(0)| < C then 

|tcW(a;)| < fc = 0,l,2,3. 

The local truncation error for the layer component is given by 
\L^{Wg - u;e)(x,)| < Ce-\x,+i - 



and ^ 

\eD+{Wg - u;e)(0)| = ^| [ {s - h)w'^{s) ds\ < CN~\ 

^ do 

Introduce the two mesh functions 

CX^ 

d'f = r N~^Yi ± {Wg - Wg)(Xi) 

j{a - 7) 

where 7 is any constant satisfying 0 < 7 < a and 

\N—i 1 L 

y.= ^ = i + h = i/N. 

Note that Yi < X~\ It is easy to see that 

X^D+Yi < 

£ 

and so Yi decreases monotonically with 0 < Ti < I. We then have sD^<F^ < 0, 
= 0 and using -I- jD'^)Yi = 0, we obtain 

CX^ 

Lg^^ = 7 N-\a{xi) - -i)D+Yi ± {Wg - Wg){xi) 

7(0 - 7) 

< ~ < 0 . 
a — 7 
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By the discrete minimum principle we conclude that 'f'* > 0 and so for all Xi S 

r ’\2 

\{W, - w;e)(a:OI < ^ < CX^N~\ 

7(a- 7 ) 

Thus, we have that 

\{We — We){xi)\ < CN~^ , when h<e. 

From an integral representation of the truncation error, we have 

|Lf (We - u;e)(a;z)| < C [ 



As before we can establish 

|(We-u;e)(x,)| <CeA2-b 
Hence, for 1 < * < iV and e < ft., 

|(We - We)(xi)| < Ce\ < Ch. 

Note that 

\D+{W, - «;e)(0)| = ^\j\s - ft)<(s) ds\ < C 
which implies that 

|(We-U;e)(0)| < |(We-rCe)(xi)| + C'ft<C'ft. 

On the interval [xi^Xi^i] we have 

r^i+i 



\{We - We){x)\ = \ w'^{t)dt- 



^2+1 



w'^{t)dt\ < CN ^||w' 



Combining this with the argument in [2] completes part (a). 

(b) If we discretize the Neumann boundary condition eu'^{0) = C by the 
standard discrete derivative eD~^Ue{0) = C on a uniform mesh then 



Ue{h) = UM + C-. 

£ 



For a fixed distance ft, 



lim |C/,(ft) - C/e( 0 )| ^ 00 . 
£— »-0 



On a uniform mesh, the discrete solution is not bounded independently of e. 
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Remarks, (i) We define a a weak boundary layer by ||itg(a:)|| < C, that is, the 
derivative is uniformly bounded with respect to e. In this case, Theorem 3 states 
that the solution can be approximated e-uniformly on a uniform mesh. How- 
ever, the first derivative u'^{x) is not approximated er-uniformly by the discrete 
derivative D^Ug{xi) on a uniform mesh. This can be checked by solving a non- 
trivial constant coefficient continuous problem and its corresponding discrete 
problem directly, setting eN = 1 and then taking the limit as iV — > oo. 

(ii) In the case of the constant coefficient problem (1), we observe that 

lim \eD^Us{xi) — eu'Jxi) \ = 0. 

»-0 

Thus, although the discrete solutions are unbounded as £ ^ 0, the scaled discrete 
derivatives are at least e-uniformly bounded and, moreover, converge as iV — > oo 
to £u'{xi) for each fixed e. However, the scaled discrete derivatives eD'^Ue{Q) 
are not er-uniformly convergent to eu'^(0). In contrast, for the problem 

LeUe = eu" -I- a(x)Mg = /(x), x G (4a) 

■Ue(O) = A, Ue{l) = B, (4b) 

with Dirichelet boundary conditions, we have that 

lim \Ue{Xi) - Ue(Xi)\ = 0, 

£—*•0 

and that 

hm|£Z9+C/,(0)-£<(0)| = O(l). 

e—>-0 

This can be seen easily from the explicit solutions to the constant coefficient 
continuous problem. 

Ue{x) = Me(0) + -X - (Ue(0) -Me(l) + ^ ) 

and the discrete problem 

IJe(xi) = Ue(0) + —Xi - (Ue(0) - Ue(l) -l- -)(^ — A = 1 + ah/e. 

3 Upwinding on a Piecewise— Uniform Mesh 

Consider the same upwind finite difference scheme (3) on the piecewise-uniform 
mesh 



= {xi\xi = 2ia/N, i < N/2; Xi = Xi-\ + 2(1 — cr)/N, N/2 < i} (5a) 



where the transition parameter a is fitted to the boundary layer by taking 



a = 



min{-, — eln Al}. 
2 a 



(5b) 



The next result shows that upwinding on this mesh produces an e-uniform nu- 
merical method. 
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Theorem 4. Let be the eontinuous solution of (1) and let f/g be the numerical 
solution generated from an upwind finite difference scheme (3) on the piecewise- 
uniform mesh (5). Then, for all N > A, we have 

\\U,-u,\\j^N <CN-^nN 

where C is a constant independent of N and e. 

Proof. As for the uniform mesh we derive 

m-v,){x,)\<CN-\ 

When a = 1/2, the mesh is uniform and applying the argument of the previous 
theorem, we get 

\{We-We){x^)\ < CN-^e~^ < CN~HnN. 

When cr < 1/2, the argument is divided between the coarse mesh and fine mesh 
regions. Consider first the coarse mesh region [a, 1], where 

|wg(x)| < < CN~\ 

Using the discrete comparison principle, we get 

We{xi) < e|u;/(0)|^i 

where T>i is the solution of the constant coefficient problem 

+ aD+<T^ = 0 , eD+<To = - 1 , <Pn = 0 . 

From an explicit representation of “Pi one can show that 

\<1>n/2\<CN-^ 

Hence, for Xi> a 

\We{,Xi) - We{,Xi)\ < \We{Xi)\ + < CN~^ 

Consider now the fine mesh region, using the same argument as in the previous 
theorem we get 

\{We - We) {Xi)\ < CX^N-HnN < CN~^ In N. 

This completes the proof. 

In [1], an essentially second order scheme is constructed on a piecewise- 
uniform mesh, using a more complicated finite difference operator. As in [2], 
the nodal error estimate for the simpler scheme presented here can easily be 
extended to a global error estimate by simple linear interpolation. That is, we 
have 

\\Ue-Ue\\jj<CN-^lnN 

where Ue is the linear interpolant of Ug. Also, using the techniques in [2], one 
can deduce that 

<CiV-ilniV. 
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4 Parabolic Boundary Layers 



In this section, we introduce a new class of problems. Let fl = (0,1), D = 
17 X (0,T] and F = /} U /I U Li- where and Fr are the left and right sides 
of the box D and Fb is its base. Consider the following linear parabolic partial 
differential equation in D with Dirichlet-Neumann boundary conditions on F 



LgUe{x,t) = 



cji^n r)n 

+ b{x,t)ue + d{x,t)^ = f(x,t), (x,t) G D, 



dx"^ 



dt 



-dUr 



Ue = (fib on Fb, on Fi, Ue = (fr On Fr 

d{x, t) > (5 > 0 and h{x, t) > /3 > 0, (x, t) € D 
= y/siPb{0), (Pb{l) = V3r(0). 



(6a) 

(6b) 

(6c) 

(6d) 



We have the comparison principle 

Lemma 3. Assume b,d € C^{D) and ijj G C’^{D) n C^{D). Suppose that >Q 
on Fb U Fr and < 0 on Fr- Then Lgip > 0 in D implies that ip > 0 in D. 

and the following stability bound 

Theorem 5. Letv be any function in the domain of the differential operator Lg. 
Then 



Ikll < (l + aT)max{||Let;||,||t;||r6urJ + 







where a = max-jj{0, {1 — b) / d\ < 1/ 5. 

Assume that the data b, d, /, (p satisfy sufficient regularity and compatibility 
conditions so that the problem has a unique solution Ug and Ug G C^{D) and, 
furthermore, such that the derivatives of the solution Ug satisfy, for all non- 
negative integers i,j,Q <i + 2j < 4 



d^+^Ue 

dx^dP 



< C £-*/2 
D 



where the constant C is independent of e. We write the solution as the sum 



Ue = Ve + We 

where Vg , Wg are smooth and singular components of Ug defined in the following 
way. The smooth component is further decomposed into the sum 



Vg = Vo + evi 



where vq , vi are defined by 



TeVl 



d^VQ 

dx'^ 



bvo + d^ = f in D, 
at 

in D, v\ = Q on F\ Fi, 



Vo = Ug on Fb 



dvi 

dx 



0 on Fi. 



(7a) 

(7b) 
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The singular component is decomposed into the sum 

Wg = Wl + Wr 



where wi and Wr are defined by 



LgWr = 0 in D , Wr = Ug 

dwi dug dv 

dx dx dx 



— vq on Fr,Wr = 0 On Ff, U Fi 


(8a) 


LgWi = 0 in £), 


(8b) 


dwr 

— — on 71, wi = 0 on FrU F^. 
ox 


(8c) 



It is clear that wi , Wr correspond respectively to the boundary layer functions on 
Fi and Fr- Assume that the data satisfy sufficient regularity and compatibility 
conditions so that Vg,Wg G C\{D). 

Theorem 6. [3] For all non-negative integers i,j, sueh that 0 < i + 2j < 4 



d^+^Vg 

dx^dP 



_< C(l + e^-*/2) 
D 



and for all (x, t) G D, 



d^+^wi{x,t) 




d^+^Wr{x,t) 


dx'^dP 




dx^dP 






where C is a constant independent of e. 

Problem (6) is discretized using a standard numerical method composed of a 
standard finite difference operator on a fitted piecewise uniform mesh. 

L^Ug = -eSlUg + bUg + dDfUg = f, (x, t) G (9a) 

Ug = ug on U F^^, , D+Ug = ^ on F^^, (9b) 

where 

DN ^ F^ = D^ n F. (9c) 

A uniform mesh with Nt mesh elements is used on (0,T). A piecewise 
uniform mesh on 17 with mesh elements is obtained by putting a uniform 
mesh with Nx/4: mesh elements on both (0, cr) and (1 — a, 1) and one with N^/2 
mesh elements on (tr, 1 — cr), with the transition parameter 

cr = min 2 -v/eln iVx|’ • (9d) 

We have the following discrete comparison principle 

Lemma 4. Assume that the mesh function F satisfies F > 0 on F^^ U F^^ 
and DfF <0 on Then L^F > 0 on implies that F >0 on . 
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The e-uniform error estimate is contained in 

Theorem 7. Letu^ he the continuous solution of (6) and let he the numerical 
solution generated from (9). Assume that Ve,We G Cf{D). Then, for all N > 4, 
we have 

sup \\Us - MellnN < CN~^ IniVj; -h 

0<e<l 

where C is a constant independent of Nx,Nt and e. 

Proof. The argument follows [3] . The discrete solution f/g is the sum Ue = + 

Wg where 14 and Wg are the obvious discrete counterparts to Ug and Wg. The 
classical truncation error estimate yields 

\L^{V,-v,)\<CV^N-^ + CN^^ and \Dt {V, - v,){0,t)\ < CN~\ 

It follows that |I4 — Ug| < CN~^ + CNff^ . Note also that 

\L^{Wi - «;,)! < CN-^ InNx + CN~^ 

and 

^e\Dt{Wi-wi){t),t)\ < CN-^\nNx. 

The proof is completed as in [3] . 



5 Numerical Results 

In this section we present numerical results for the following specific elliptic 
problem 

du, 

sAue + = 16x(l -x){l- y)y, {x, y) G (0, 1)^ (10a) 

ox 

Mg = 1, (x, y)erRU/r (10b) 

du 

(x, y)erL, = -16x^(l - x)^, (x,y)e/B (10c) 

whose solution has a parabolic boundary layer near Tb . The nature of the bound- 
ary layer function associated with this layer is related to the solutions of the 
parabolic problems examined in the previous section. In Figure 1 we present 
the numerical solution generated by applying standard upwinding on a uniform 
mesh. The numerical solutions are not bounded uniformly with respect to e 
as e ^ 0. This should be compared with the accurate approximation given in 
Figure 2, which was generated by applying standard upwinding on the piecewise- 
uniform mesh 




X , 
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Fig. 1. Numerical solution generated by upwinding on a uniform mesh 
with N=32, e=10“® for problem (10) 



where 

r = min ■! — , -\/e In N 

Note the significant difference in the vertical scale in these two figures. In Table 1 
we present the computed orders of convergence (see [2]) generated by applying 
standard upwinding on this piecewise-uniform mesh. These indicate that the 
method is e-uniformly convergent for problem (10). 
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Fig. 2. Numerical solution generated by upwinding on a piecewise-uniform mesh 
with N=S2, e=10“® for problem (10) 



Table 1. Computed orders of convergence generated by upwinding on a 
piecewise-uniform mesh applied to problem (10) 



Number of intervals N 
e 8 16 32 64 128 

1 1.20 1.11 1.06 1.03 1.02 
2"^ 1.23 1.14 1.07 1.04 1.02 
2"'* 1.18 1.11 1.06 1.03 1.01 
2"® 1.20 1.17 1.09 1.04 1.02 
2“® 0.54 0.69 0.70 1.09 1.04 

2 ~^° 0.54 0.77 0.81 0.82 0.83 

2-12 Q 55 Q Q Q g2 Q gg 

2-1^ 0.55 0.77 0.81 0.82 0.82 
p" 0.76 0.77 0.74 0.90 0.82 
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Abstract. We construct a new numerical method for computing refer- 
ence numerical solutions to the self-similar solution to the problem of 
incompressible laminar flow past a thin flat plate with suction-blowing. 
The method generates global numerical approximations to the veloc- 
ity components and their scaled derivatives for arbitrary values of the 
Reynolds number in the range [1, exs) on a domain including the boundary 
layer but excluding a neighbourhood of the leading edge. The method is 
based on Blasius’ approach. Using an experimental error estimate tech- 
nique it is shown that these numerical approximations are pointwise ac- 
curate and that they satisfy pointwise error estimates which are indepen- 
dent of the Reynolds nnmber for the flow. The Reynolds-uniform orders 
of convergence of the reference numerical solutions, with respect to the 
number of mesh subintervals used in the solution of Blasius’ problem, is 
at least 0.86 and the error constant is not more than 80. The number 
of iterations required to solve the nonlinear Blasius problem is indepen- 
dent of the Reynolds number. Therefore the method generates reference 
numerical solutions with e-uniform errors of any prescribed accuracy. 



1 Introduction 

The numerical solution of singularly perturbed boundary value problems, for 
which the solutions exhibit boundary layers, gives rise to significant difficulties. 
The errors in the numerical solutions of such problems generated by classical 
numerical methods depend on the value of the singular perturbation parameter 
£, and can be large for small values of e [2]. For representative classes of sin- 
gular perturbation problems special methods have been constructed and shown 

* This research was supported in part by the National Science Foundation grant DMS- 
9627244, by the Enterprise Ireland grant SC-98-612 and by the Russian Foundation 
for Basic Research grant No. 98-01-00362. 
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theoretically to generate numerical approximations that converge e-uniformly. 
Also, numerical experiments have confirmed the efficacy of such methods in prac- 
tice [2]. Singularly perturbed boundary value problems, for which the solutions 
exhibit boundary layers, frequently arise in flow problems with large Reynolds 
number Re. In such problems the small parameter e = Re~^. The discretization 
of such problems gives rise to nonlinear finite difference methods for which there 
is no known e-uniform error analysis in the maximum norm. For this reason an 
experimental method for justifying e-uniform convergence is the only remaining 
possibility. To make use of such a technique, especially for large Re, it is essential 
to have a known e-uniform reference solution which approximates the exact so- 
lution to any prescribed accuracy. For flow problems with boundary layers there 
is usually no known analytic solution that can be used as a reference solution, 
and the same is true even for problems with a self-similar solution. Thus the 
task of constructing a reference numerical solution with er-uniform errors of any 
prescribed accuracy arises from a wide class of flow problems. 

An example of such a problem is flow past a flat plate with suction-blowing, 
for all Reynolds numbers for which the flow remains laminar and no separation 
occurs. For this problem it is important to construct a numerical method for 
which the pointwise errors in the scaled numerical solutions and their scaled 
derivatives are independent of the Reynolds number. In the present paper we 
consider the associated Prandtl problem of flow past a flat plate with suction- 
blowing. For large values of the Reynolds number the solution of this problem 
exhibits parabolic boundary layers in the neighbourhood of the plate, outside a 
neighbourhood of the leading edge. At the leading edge new singularities appear 
due to the incompatibilities of the problem data at the leading edge. Therefore, in 
the present paper we construct a numerical method which generates Reynolds- 
uniform reference numerical approximations to the scaled velocity components 
and their scaled derivatives for arbitrary values of the Reynolds number in a finite 
rectangular domain including the boundary layer but excluding a neighbourhood 
of the leading edge. This numerical method is based on the numerical solution 
of the related Blasius problem on the positive semi-axis. The accuracy of the 
numerical approximations depends on only the number of mesh subintervals N 
used for the solution of the Blasius problem. Our method is a development of 
that described in [2] for flow past a flat plate without suction-blowing. 

2 Formulation of the Problem 

We are required to find the solution, and its derivatives, of Prandtl’s problem for 
incompressible flow past a semi-infinite flat plate P = {(a:, 0) S : x > 0} with 
suction-blowing in a bounded domain D, which adjoins the plate and contains 
the boundary layer. 
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Prandtl’s problem on the cut plane il = 3?^\P is described as follows 

' Find Up = (up,vp) such that for all {x,y) G fl 
Up satisfies the differential equations 






^ d .xHx.y) ^ ^ Vmp(x, y) = 0 

V • up(x,y) = 0 

with the boundary conditions 



up{x, 0) = 0, wp = vq{x) for all X > 0 



. lim|y|_>^ up(x,y) = lima;_>_oo up(x,y) = (1,0), for all x e 3? 

where xo(x) is the vertical component of the suction-blowing velocity. This is a 
nonlinear system of equations for the unknown components up, vp of the velocity 
Up. The solution at all points in the open half plane to the left of the leading 
edge is up = (1,0). For special choices of the function vq the solution of (Pp) is 
self-similar, see (3) below. 

Note that in Prandtl’s problem, even without suction-blowing, the vertical 
component of the velocity tends to infinity as we approach the leading edge. To 
avoid this singularity, we choose the computational domain D = (a. A) x (0, B) 
where a, A and B are fixed positive numbers independent of Re. Our aim is to 
construct a method for finding reference numerical approximations to the self- 
similar solution and its derivatives of problem (Pp) for arbitrary Re G [l,cx)) 
with error independent of Re. 

We now describe conditions under which the solution of (Pp) is self-similar. 
Using the approach of Blasius, see [f], for example, a solution up = (up,vp) of 
(Pp) can be written in the form 



up{x,y) = UB{x,y) = f'{r)) 

vp{x,y) = VB{x,y) = “ /(^)) 

where 



V = y\/l?e/2x 

and the function / is the solution of the problem 

Find a function f G C'^([0,oo)) such that for all rj G (0,oo) 



( 1 ) 

(2) 



(^b){ 



f"iv) + fiv)nv) = 0 

with the boundary conditions 
. /(O) = /o, /'(O) = 0, lim,,_>oo fiv) = 1. 
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(Pb) is known as Blasius’ problem and ub = {ub,vb) is known as the Blasius 
solution of (Pp). The existence and uniqueness of a solution to this third order 
nonlinear ordinary differential equation is discussed in [1]. Positive values of fo 
correspond to suction, while negative values of /o represent blowing, and /o is 
related to vq in (Pp) by the formula (see for example [3]) 

vq{x) = -/o\/l/2xPe. (3) 

The first order derivatives of the velocity components up and vp are given 
by 



dup dvp 



2xRe 






(4) 

(5) 

( 6 ) 

(7) 



From (1), (2), (4), (5), (6) and (7) we see that to find the velocity com- 
ponents Up and Up, and their first order derivatives, it is necessary to know 
vf'iv) ~ vf'iv) for all 77 G [0,oo). We also observe 

from these relations that, when Re is large, vp and are small and is 
large. Therefore, in order to have values of order unity, we use the following 
scaled components: '/Revp, and 

In the next section numerical approximations to the solution of {Pb), and 
its first order derivatives, are constructed on the semi-infinite domain [0,oo). 



3 Numerical Solution of Blasius’ Problem 

To find Up and vp and their first order derivatives we have to solve (Pb) for / 
and its derivatives on the semi-infinite domain [0, 00). This is not a trivial matter, 
since numerical solutions can be obtained at only a finite number of mesh points. 
For this reason, for each value of the parameter L € [l,oo), we introduce the 
following problem on the finite interval (0, L) 

Find a function fp G C^(0, L) such that for all rj G (0, L) 

n'{v)+hiv)mv) = o 

with the boundary conditions 
./l(0) = /o, /i,(0) = 0, fp{L) = l. 



{Pb,l) < 
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The collection of all such problems forms a one-parameter family of problems 
related to (Pg), where the interval length L is the parameter of the family. 
Because the values of /g, and are needed at all points of [0,oo), we 
introduce the following extrapolations 



fliv) = OJor all ?7 > P 


(8) 


Aiv) = l,for all ?7 > P 


(9) 


fLiv) = iv- L) + /l(P), for all 77 > P. 


(10) 



To solve (Pg), we first obtain a numerical solution Pg of (Pg,g) on the finite in- 
terval (0, L) for an increasing sequence of values of L. Then, we extrapolate Pg to 
the semi-infinite domain [0, oo). The sequence of values of L is defined as follows. 
For each even number iV > 4 define Pgr = InN (see [2] for motivation for this 
choice of Lm) and consider the corresponding finite interval [0, Pgr]. On [0 ,Pat] 
a uniform mesh = {rji : rji = iN~^lnN, 0 < i < N}q with N mesh subinter- 
vals is constructed. Then numerical approximations Pg, D~^Fl, to /g, 

—N 

f'h, /g respectively, are determined at the mesh points in using the following 
non-linear finite difference method 



Find F on such that, for all rji G , 2 < i < N — 1, 



{Pb,l) S^D-FM) + F{tj,)D+{D-FM) = 0 



P(0) = /o P+P(0) = 0, and D°F(r]N-i) = I- 



We note that, in order to simplify the notation, we have dropped explicit use of 
the indices P and N. Thus, we denote the solution of P^g by P instead of F^ . 

Since {Pg i) is non-linear, we use the following iterative solver to compute 
its solution 



For each integer m, 1 < m < M, find P™ on 

such that, for all iji G 

5^{D~F^){r]i) + F'^-^{r]i)D+{D- F'^){rji) - D~{F"^ - P™-^)(r?,) = 0 



(^g)<^ 



P™(0) 



/o, P+P™(0) = 0, and P0p™(7?^_i) = 1 



with the starting values for all mesh points iji G 






Algorithm (A^) involves the solution of a sequence of linear problems, with one 
linear problem for each value of the iteration index m. The total number of it- 
erations M is taken to be M = 8lnN The motivation for this choice of M is 
described in [2]. It is important to note the crucial property that M is indepen- 
dent of the Reynolds number Re. The final output of algorithm {Ag ) is denoted 
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by F , where again we simplify the notation by omitting explicit mention of the 
total number of iterations M. We follow the same criterion as in [2] to deter- 
mine F on the finest required mesh as the ’’exact” solution. The corresponding 
value of N is denoted by Nq. 

To ensure that F, D~^F and D^D^F are defined at all points of each 
mesh the following values are assigned: D~^F{r]iy) = 1, F{r]M-i) = 

0, D~^ F{r]]sr) = 0. We then define F, D~^F and D'^D'^F at each point of 

[0 , Ltv] using piecewise linear interpolation of the values at the mesh points of . 
The resulting interpolants are denoted by F, D+F and D+D+F respectively. 

In order to define F, D+F and D+D+F at each point 77 € [0, cxd) the following 
extrapolations, analogous to (8), (9) and (10), are introduced 



D+D+F{r]) = 0, for all 77 G [Ln, 00) (11) 

D+F{rf) = l,for all rj G [L]\f,oo) (12) 

F{r]) = F{Ln) + (77 - at), for all 77 G [Fat, 00). (13) 

The values of F, D+F and D+D+F, respectively, are the required numerical 
approximations to /, / , / of the Blasius solution and its derivatives at each 
point of [0, (X)). 

4 Numerical Experiments for Blasius’ Problem 

In [3] a limiting value for suction is found at /o = 7.07 and for blowing at /o = 

— 0.875745. In numerical experiments to illustrate the proposed technique, we 
take the representative values /o = 3 and /o = 6 for suction; /o = —0.25 and /o = 

— 0.5 for blowing. 

We want to determine error estimates for the approximations F, F+ F and 
D+D+F to /, / and / , respectively, for all N > 2048. Consequently, we 
take where Nq = 65536, to be the finest mesh on which we solve Bla- 
sius’ problem. Using the experimental numerical technique described in [2] we 
determine the following computed error estimates 

/o = 3 

||^-/||[0.oo) < 2.505iV-0-86 

P+i"-/'ll[0,oo) < 1.4521V-0-86 

\\D+D+F - /"||[0.oo) < 20.4271V-0-84 

/o = 6 



l|i"-/ll[o.oo) < 2.635iV-o-86 

P+i"-/'ll[ 0 ,oo) < 2.9251V-0-86 

\\D+D+F - /"||[o.oo) < 65.9271V-0-81 
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/o = -0.25 

||F-/||[0,oo) < 1.066iV-o-86 
-/' II [0,oo) < 0.202-0-86 

||^+^+i^-/"||[ 0 ,oo) < 0.453iV-o-86 

/o = -0.5 

ll^-/ll[ 0 ,oo) < 0.603iV-o-85 

P+i"-/'ll[ 0 ,oo) < 0.345iV-o-87 

WD+D+F - /"||[o.oo) < 0.488iV-o-86. 



Similarly, the computed error estimates for the approximations ’qD'^F(rj) — 
F{rj), riD+D+F{ri) and rj^D+D+F{ri) to {rjf — f){rj), r]f"{ri) and 'rff 'iri), 
respectively, for all N > 2048, are 

/o = 3 

II {ijD^ -F)- {r^f' - /) II [0,00) < 2.505iV-0-86 
\\r^{D+D+F - 

\\rj^{D+D+F - /")||[o,oo) < 0.7iV-0-86 

/o = 6 

II {ri^ -F)- {rj/ - /) II [0,00) < 2.635iV-0-86 
MD+D+F - /")||[o,oo) < 3.297iV-o-86 
\\rj\D+D+F - /")||[o,oo) < 0.745iV-0-86 

/o = -0.25 

\\{ri^-F) - (r?/' - /) II [0,00) < 1.066iV-0-86 

||r;(i7+i7+F-/")||[o,oo) < 1.178iV-0-86 

\\rj\D+D+F - /")||[o,oo) < 3.275iV-0-86 

/o = -0.5 



\\{rjD+F -F)- {rj/ - /)||[o,oo) < 1.228iV-0-86 
||r)p+i7+F-/")||[o,oo) < 1.670iV-O-86 
\\r,^{D+D+F - /")||[o,oo) < 5.952iV-0-86. 

We see from the above computed error estimates that, in all cases and at each 
point of [0,oo), the orders of convergence "with respect to N, the number of mesh 
intervals used to solve Blasius’ problem, are not less than 0.81. Similarly, in all 
cases, the error constants are at most 65.927. The -worst cases occur for /o = 6. 
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5 Numerical Experiments for Prandtl’s Problem 

In this section we find reference numerical solutions of Prandtl’s problem and 
computed error estimates for the scaled numerical solutions and their deriva- 
tives. In all of the numerical computations we use the specific values a = 0.1, 
A= 1.1, B= 1.0. 

We construct the approximations U_b = (Ub,Vb) of the velocity components 
ub of the self-similar solution of Prandtl’s problem (Pp) by substituting the 
approximate expressions F and D+F for / and /' respectively, into (1) and (2). 
Thus, for each {x, y) in the open quarter plane {{x,y) : x > 0,y > 0)} we have 



UB{x,y) = D+F{y) (14) 

^ \/ ~ 

We call Us = (Ub,Vb) the reference numerical solutions of the self-similar 
solution of Prandtl’s problem (Pp). 

We now assume that error estimates, for the scaled approximations 
{Ub, VI^Vb) to {up, VPevp), of the form 

\\Ub-up\\jj<C,N-p^ 

VMVb - vpW-^ < C2N~p^ 

are valid for all N > Nq where pi > 0,P2 > 0, and the constants No,pi,p 2 , Ci, C 2 
are independent of the total number of iterations M and the number of mesh 
intervals N used in the numerical solution of Blasius’ problem. 

The errors in the a:-component U b and the scaled y-component V RcVb of 
the velocity corresponding to M > 8lnN satisfy 

\\UB-Up\\jj=\\WT-f'\\[0,^) 

VlfeWVB - vp\\^ = - F{y)) - (yf - /)]||[o.oo) 

< V5\\{ymT{y)-F{y)) - {yf - /)||[o,oo)- 

Then, using the experimental numerical technique described in [2] and the com- 
puted error estimates for the numerical solutions of Blasius’ problem in the pre- 
vious section, we obtain for all N > 2048 the following computed error estimates 
for the reference numerical solutions of Prandtl’s problem 

/o = 3 



\\Ub - upWjj < 1.452N-^-^^ 

Vlle\\VB - vp\\j^ < 5.601A^-° ®® 
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/o = 6 



\\Ub - up||^< 2.925A^-°'®® 
v^ll Vb - vpWjj < 5.897V-o-®6 



/o = -0.25 



\\Ub - up||^< 0.202A^-0'86 
Vi^||VB-^;p||7j<2.38iV-o-86 



/o = -0.5 



||C/p-up||^< 0.345A^-o-87 

v^||Vb-^;p|| 77< 1.35iV-o-86 



We see from these computed error estimates that, in all cases, the orders 
of convergence with respect to N, the number of mesh intervals used to solve 
Blasius’ problem, are at least 0.86. Similarly, in all cases, the error constants are 
at most 5.89. The worst case occurs for /o = 6. 

Substituting the appropriate expressions into (4), (5), (6) and (7) we obtain 
the approximations D^Ub, DyllB, D^Vb, DyVB to the first order derivatives of 
the velocity components of the self-similar solution of Prandtl’s problem (Pp), 
where 



From the computed error estimates for the numerical solutions of Blasius’ 
problem, in the previous section, we obtain for all N > 2048 the following 
computed error estimates for the reference scaled discrete derivatives of the 
velocity components 



DyUB{v{x,y)) = ^D+D+F{r]) 



— ^n+n+ 



DyVB{v{x,y)) = ^D+D+F{y) 



D^UB{v{x,y)) = -DyVB{r]{x,y)) 





<V5\\D+D+Fir,)-f"{y)\\io,oo) 
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\DyVB-^\\Y^=\\D,UB- 



dup 



= :^\\D+D+F{it)- r 



VRcWD^Vb - - ^^1 



2xRe 



rj^\\D+D+F{r,)-f (ry)||) 



- + ^JJ^V^D+D+F{rj) - /"(ry)ll). 

Then, for all N > 2048 we obtain the following estimates 
/o = 3 

^J\DyUB-^U<^5.676N-^-^^ 
lby^B-^lb<9iV-o-85 
VR^\\D,Vb ~^\\n< 35.831iV-o-86 

/o = 6 

^jDyUB-^U<U7A2N-^->^^ 
lby^B-|^lb<16.49iV-o-85 
VR^\\D,Vb - 37.78iV-o-86 

/o = -0.25 

Py^B-^|l77<5.89iV-o-86 

Vi^WD^Vb - ^ lb < 48.527V-0-86 

/o = -0.5 

7felbyC^B-^lb<1.09iV-os6 

Py^B-^lb<8.35iV-o-86 

- ^Ib < 73.3iV-o-86. 

We see from these computed error estimates that, in all cases, the orders 
of convergence with respect to N, the number of mesh intervals used to solve 
Blasius’ problem, are at least 0.85. Similarly, in all cases, the error constants are 
at most 73.3. The worst order of convergence occurs for /o = 6 and the worst 
error constant for /g = —0.5. 
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Remark on Navier- Stokes’ Problem It is well known that incompressible flow 
past a plate P = {(x, 0) G 3?^ : a; > 0} with suction-blowing in the domain 
D = is governed by the Navier-Stokes equations 

' Find uns = (wats, r'Ars), Pns such that for all (x,y) G D 
uns satisfies the differential equations 

— T^Auns + Uns • Vuns = — 



{Pns) < 



V • ujvs = 0 

with the boundary conditions 
uns{x,0) = 0,vns = vo(x) for all x > 0 
/tm|j/l_>ooUNs(a;,2/) = lim^^-^-ooUNsix, y) 



(1, 0), for all X G 3? 



where uns is the velocity of the fluid, Re is the Reynolds number, p is the 
density of the fluid and p is the pressure. This is a nonlinear system of equations 
for the unknowns u^s, Pns- K is known that the solution of (Pp) is a good 
approximation to the solution of {Pns) in a subdomain excluding the leading 
edge region, provided that the flow remains laminar and no separation occurs. 
Moreover, as Re increases the difference between the solutions of problems (Pp) 
and {Pns) decreases. This means that the reference solution of Prandtl’s problem 
is the leading term in the solution of the above Navier-Stokes’ problem. 



6 Conclusion 

For the problem of incompressible laminar flow past a thin flat plate with 
suction-blowing we construct a new numerical method for computing reference 
numerical solutions to the self-similar solution of the related Prandtl problem. 
The method generates global numerical approximations to the velocity compo- 
nents and their scaled derivatives for arbitrary values of the Reynolds number 
in the range [l,oo) on a domain including the boundary layer but excluding a 
neighbourhood of the leading edge. The method is based on Blasius’ approach. 
Using an experimental error estimate technique it is shown that these numer- 
ical approximations are pointwise accurate and that they satisfy pointwise er- 
ror estimates which are independent of the Reynolds number for the flow. The 
Reynolds-uniform orders of convergence of the reference numerical solutions, 
with respect to the number of mesh subintervals used in the solution of Blasius’ 
problem, is at least 0.86 and the error constant is not more than 80. The num- 
ber of iterations required to solve the nonlinear Blasius problem is independent 
of the Reynolds number. Therefore the method generates reference numerical 
solutions with er-uniform errors of any prescribed accuracy. 
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Abstract. Solving the algebraic linear systems proceeding from the dis- 
cretization on some condensed meshes of 2D singularly perturbed prob- 
lems, is a difficult task. In this work we present numerical experiments 
obtained with the multigrid method for this class of linear systems. On 
Shishkin meshes, the classical multigrid algorithm is not convergent. We 
see that modifying only the restriction operator in an appropriate form, 
the algorithm is convergent, the CPU time increases linearly with the 
discretization parameter and the number of cycles is independent of the 
mesh sizes. 



1 Introduction 

In this paper we are interested in the application of multigrid techniques to 
solve the algebraic linear systems arising from the discretization of singularly 
perturbed problems on Shishkin meshes. We consider problems of type 

LgU = —e Au + h ■ u + cu = f, in 17 =(0,1)^, 

du ( ^ ) 

M = 0, on Fd, 7 ^ = 0, on 
on 

where F = dfl = Fjj U Fn and 0 < e < 1. We assume that b, c and / are 
sufficiently smooth functions satisfying enough compatibility conditions with 
c > 0. Thus, depending on the value of the convection term, it is known, [8], 
that the exact solution of (1) can present regular and/or parabolic layers. In 
all cases, classical schemes on uniform meshes give a numerical solution reliable 
only if a very large number of mesh points is taken [8]. To solve efficiently this 
type of problems, e-uniformly convergent schemes are needed. 

In recent years, schemes based on a priori fitted meshes (see [8] and references 
therein) are commonly used for the numerical approximation of the solution of 
problems of type (1). Between the different possibilities, Shishkin meshes, [10,11], 
seem the most adequate because they can be easily constructed. On these meshes, 
classical numerical schemes are in many cases uniformly convergent [8]. Never- 
theless, since the ratio between the mesh sizes is very large for e: sufficiently 

* This research was supported by the projects DGES-PB97-1013 and P226-68 
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small, the resolution of the associated linear systems is difficult [9]. The BI- 
CGSTAB algorithm, [12], is generally an efficient method when a not very large 
number of mesh points is taken. In general, for large linear systems, the multi- 
grid technique is a good alternative, [1,5,13], but efficient multigrid has not yet 
been achieved for singular perturbation problems on Shishkin meshes. In [4] we 
showed that standard multigrid is not convergent when the numerical schemes 
are constructed on Shishkin meshes. Also, we saw that modifying adequately 
the restriction operator, the deduced algorithm is very efficient. In this paper we 
present the modified restriction operator, adapted for use on Shishkin grids. We 
apply this multigrid algorithm for a hybrid scheme solving a convection-diffusion 
problem with regular layers and a high order scheme for a problem with regu- 
lar and parabolic layers. Another approach that will lead to efficient multigrid 
methods for singular perturbation problems, is presented in [6] and [14]. In this 
approach the smoother is changed to an incomplete line LU relaxation method 
(ILLU), which makes classical multigrid more robust. Finally, we would like to 
mention that the algebraic multigrid methods may also lead to robust solvers 
for the problem considered here. 



2 The Multigrid Algorithm 



All components of the multigrid that we consider, except the restriction opera- 
tor, are standard components, [13], i.e., the smoother is a line Gauss-Seidel of 
alternating symmetric type, the prolongation operator is the bilinear interpola- 
tion and the coefficient matrix of the linear systems, constructed in each level of 
the algorithm, are obtained by discretization, with the finite difference scheme, 
of the differential equation on the corresponding associated grid. Let Gj, fIi-\ 
be the spaces of the grid functions respectively defined on the meshes of level I 
and Z — 1 of the multigrid algorithm. The restriction operator R\~^ is a linear 
mapping 

R\-^ : a Gi_i, 

r’- I — > R\~^r’- = 



which maps fine-grid functions onto coarse-grid functions. They can be repre- 
sented by the stencil 



R 



i-i 

i 



^?0,l £>1,1 

£>- 1,0 £> 0,0 £> 1,0 , 



.£>- 1,-1 £> 0,-1 £> 1 ,- 1 . 



which describes the formula 



1 

m,n= — 1 



where = Xi - Xi-i, hf = xt+i - Xi, h'^_^ = yj - yj-i, h\ = yj+i - yj, h% = 
/iq = 0. To define a general restriction operator we proceed as follows. Let Vp 
be a molecule centered in the point P = (xi,yj) on the fine grid. The residual 
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associated to this molecule is calculated by 

Qv^(r^)= ( 2 ) 

PidVl 



where ap^ are the weights of a quadrature formula. The restriction operator on 
the coarse grid at P, r^p^, is given by the following discrete conservation equality 

{areaVp)r‘f^ = Qyi^{r‘). (3) 

Using the composite trapezoidal rule on uniform meshes, we obtain the most 
commonly used restriction operator, the full weighting operator, defined by the 
stencil 



R 



i-i 

i 



1 

16 



1 2 1 
242 
1 2 1 



(4) 



To define a different operator (see [4] for details) , we only modify the quadrature 
formula used to calculate the residual. For the x direction (similarly for the y 
direction), we use the composite trapezoidal rule when the step sizes associated 
to the point P are equal, i.e., Jffi = hf. Otherwise, when hf_i yf hf, the formula 
is 






^,3 



2+1, i 






(5) 



Thus, the 2D quadrature formula is the product of the corresponding ID formu- 
las. 



3 A Hybrid Difference Scheme 

In this section we want to approximate the solution of problem (1) supposing 
that b = (61,62) > (/3 i,/ 32) > (0,0). In this case, since there are regular layers 
in X = 1 and y = 1, the Shishkin mesh, 17 at, is constructed as follows. Let TV > 4 
be an even number. We define the transition parameters 

CTa: = min{l/2,cro.x£log A^}, cry = min{l/2,cro.y£logiV}, (6) 

where > l//3i, cro,y > 1//32 are constants to be chosen later. Taking iV/2-|-l 
uniformly distributed points in the intervals [0, 1 — Ux] and [0, 1 — cry], and also 
iV/2 -I- 1 equally spaced points in [1 — ax, 1] and [1 — CTy, 1] we obtain the grid 
as tensor product of the corresponding one-dimensional meshes. We see that if 
£ is large enough, the mesh is uniform; otherwise, the points concentrate in the 
regular layer region, having only two different step sizes in each direction, given 
by Hx = 2(1 - ax)/N, hx = 2ax/N, Hy = 2(1 - ay)/N, hy = 2ayjN. We note 
that, for each iV, only the finest grid is of Shishkin type; the grid associated to 
level I — 1 has step sizes, in each direction, which are double of the corresponding 
step sizes in previous level 1. 
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In the sequel, we denote hf = Xi — = yj — yj-i, D~ , the 

backward, forward and central difference discretizations of the first derivative 
respectively, and D~ D'^ , the second order central difference discretization, and 
similarly for the variable y. We define the following hybrid difference operator 
to approximate the first-order derivative: 

j for 0 < i < iV/2, 

^ \ for N/2<i< N, 

and analogously we can define D^. On we consider the scheme 

Lf t/.., = -e{D~Dt + D~D+)U.,, + b,,, • 

= (7) 

Ui^j = o, on r^ = rnf2^. (8) 



This scheme is uniformly convergent with order 1 (see [7]). Considering the re- 
striction operator given in section 2, we obtain four different expressions depend- 
ing on where the point is. In the algorithm, these operators must be calculated 
one time at the beginning. This fact supposes a great simplification in the code 
in contrast with general non uniform grids. Defining the following sets of points. 



^ = {(1 ay)}, 

= {(x„ 1 - uj,), z = 0, . . . , iv} \ 

= {(1 _ y,), j = 0, . . . ,iV} \ 

= n\ u u 

the operators are given by 



0 


(2 


0'x)^y 




^ X 


Gy 








0 


(2- 


ax)(2 - a 


y) 


o-x(2 


- 0-y) 


, if {xi 


yj) e 


(9) 


0 




0 




0 










Uy 


2a y 




Gy 










2 


— Gy 


2(2 — CFy) 


2 


— Gy 


, if 


(,xi,yj) e 




(10) 




0 


0 




0 










0 


2- 


^X 














0 


2(2- 


(Tx) ^CTx 


5 


if {xi,Vj) 






(11) 


0 


2- 


^x ^x 















and by (4) if (xi,yj) € 17’’. To see the good properties of the new multigrid 
method, we solve the problem 



—eAu + Ux + Uy = f, in 17, (12) 

u = 0, on r, 

where / is such that the exact solution is given by u{x,y) = xy{e^^~^'^ ^ — 
— 1). We show the results on 32^, 64^, 128^ and 256^ Shishkin meshes 
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for some values of e sufficiently small. In [4] we saw that for large values of diffu- 
sion parameter e the new restriction operator is less efficient that the full weight- 
ing operator. In Table I, the spectral radius p, the number of iterations needed to 
obtain a residual of 10“^ and the number within brackets corresponding to wall- 
clock time, are shown. We also show the discrete maximum norm of the global 
discretization error, i.e., II e ||oo= max^ \ u{xi,yj) — Uij |, i, j = 0, 1, . . . , This 
table illustrates both the first-order convergence of the discretization given by 
the hybrid scheme, a linear increment of CPU time and also the independence 
of the spectral radius with respect to the size of the mesh. Thus, we conclude 
that the method has all expected good properties of the multigrid technique. 



Table 1. Error, spectral radius, number of cycles and CPU time 



Grid 




1 

O 

t-H 

II 

to 


e = 10"“ 


e = lO-'’ 


e = 10"' 


e = 10~“ 


32 X 32 


II C II oo 
P 

CPU 


5.97D-2 

0.03 

6(0.69) 


5.99D-2 

0.03 

7(0.82) 


5.99D-2 
0.03 
8 (0.93) 


5.99D-2 
0.03 
8 (0.93) 


5.99D-2 
0.03 
9 (1.05) 


64 X 64 


II C II oo 
P 

CPU 


3.10D-2 
0.04 
7 (3.17) 


3.12D-2 
0.04 
8 (3.61) 


3.13D-2 
0.04 
8 (3.61) 


3.13D-2 
0.04 
9 (4.05) 


3.13D-2 

0.04 

10(4.49) 


128 X 128 


II C II OO 
P 

CPU 


1.56D-2 

0.04 

7 (12.78) 


1.58D-2 

0.05 

8 (14.57) 


1.58D-2 

0.05 

9 (16.35) 


1.58D-2 

0.05 

10 (18.13) 


1.58D-2 

0.05 

11 (19.90) 


256 X 256 


II C II c)0 

P 

CPU 


7.69D-3 

0.12 

11 (80.70) 


7.88D-3 

0.06 

8 (59.15) 


7.90D-3 

0.06 

9 (66.40) 


7.90D-3 

0.06 

11(80.78) 


7.90D-3 

0.06 

12(88.20) 



4 A High Order Scheme on a Shishkin Mesh 



Now we consider the problem 



— e Au Ux = sin(7Tx) sin(Try), in 
u(0,y) = 0, M(l,y) = 1, y e [0,1], 



12 =( 0 , 1 ) 2 , 

du{x,0) du{x,l) 

dn dn 



0, X S [0, 1] 



(13) 



In this case a regular layer in x = 1 and two parabolic layers in y = 0 and 
y = 1 appear in the solution. Thus, to construct the Shishkin mesh we take 
(see [8]) the transition parameters 



(Tj; = min{l/2,(To.xelogA^}, cr^ = min{ 1/4, cro,yV^ log iV}, (14) 



and we define a piecewise uniform mesh with N/2 -|- 1 points in [0, 1 — ax] and 
[1 — (Ta,, 1], iV/4-l-l points in [0, ay] and [I — ay, 1] and iV/2-|-l points in [ay, 1 — ay]. 
Again, for e large the mesh is uniform and otherwise we have two different step 
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sizes for each space direction, given by = 2(1 — ax)/N, = 2ax/N, Hy = 
2(1 — 2ay)/N, hy = Aay/N. Considering the following sets of points 

f^N.i = {{xi,yj) G f2 : 0 < i < N/2}, , , 

nN,2 = {{x,,y,)Gf2:N/2<i<N}, 



the scheme that we use (see [2] for details of the construction) is given by 

— i + l “ QN{fi,j)y 

{.Xi^yj) G Ctv , 
(16) 



where rf , and rf , are 

‘'tJ 



r^ = 



-2e 



r^ = 



-2e 






and the remaining coefficients are defined, depending on where the point is, as: 
1 _ -2g - h,jK 12. 2 _ -2g 



^,3 J /j: 









^Ij = - rlj - rlj + QN{fi,j) = Qhifij)^ (xi,yj) e ^n,i 



-2e 



■ 



rh = 



-2e 



{hf + hf^,)hf + (hf + hf+i) 



'X I LX 



4+1 



^Ij = -rlj - rlj - + - +■ + hj, QNifij) = fij, (xi,yj) G Cjv.2- 

This method is uniformly convergent with order 3/2 for g sufficiently small 
(see [2]). Now, we have six different restriction operators depending on where 
the point is localized in the mesh. Distinguishing the following sets of points 



'^ = {(1 -ax,l- <Xy)}, 
= {(1 - <Xx,cry)}, 
= {(x„ 1 - aj,), i = 0, . . . , iV} \ 

12"-" = {(x„ay),i = 0,...,iV}\12", 
= {(1 _ = 0, . . . ,iV} \ {+- U 12-}, 

12’' = 12 \ {12“’i— U 12^’- U U 12i— U 12-}, 

the operators are given by 
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Ra 



1 

4 



0 0 0 
1 - ay 2(1 -ay) I -ay , 



if (Xi,yj) e 



(20) 



by (11) if (xi,yj) G 17^ and by (4) if (xi,yj) G 17’'. Since we do not know 
the exact solution, we estimate the errors by = maxij \ I) b J = 

0, 1, . . . iV, where is the approximation on the mesh = {{xi,yj),i,j = 
0,1,..., 2A^} defined as 



(X2i,y2j) = {xi,yj) G 17", i,j = Q,l,...,N 

/ \ ,Xi Xi+I t/j + J/j + 1 \ . . ^ , AT ^ 

[X 2 i+i,y 2 j+l) = ( ;r , ;r ), l, J = 0,1, . . . , N - I, 



and the numerical order of convergence, calculated using the double mesh prin- 
ciple (see [3]), is given by p = log(ef)-/ef^)/ log 2. In Tables 2 and 3 we show the 



Table 2. Error and convergence rates outside of layer regions 



N 


£ = 10"” 


£ = 10““ 


£ = lO"" 


£ = 10"’ 


» 

1 

O 

II 


32 


3.052D-3 

2.009 


3.074D-3 

2.000 


3.076D-3 

1.999 


3.076D-3 

1.999 


3.076D-3 

1.999 


64 


7.583D-4 

2.020 


7.683D-4 

2.003 


7.693D-4 

2.002 


7.694D-4 

2.001 


7.694D-4 

2.001 


128 


1.869D-4 

2.039 


1.917D-4 

2.005 


1.921D-4 

2.001 


1.922D-4 

2.001 


1.922D-4 

2.001 


256 


4.547D-5 


4.775D-5 


4.798D-5 


4.800D-5 


4.800D-5 



maximum point errors and the corresponding rates of convergence of the finite 
difference scheme in two subdomains: 17^; = [0,1 — ax) x {ay, 1 — ay) (outside of 
layers regions) and f2d = [1 — CTa;, 1] x [0, 1 — ay] (in a corner layer). From these 



Table 3. Error and convergence rates in a corner layer 



N 


£ = 10"'’ 


£ = 10“'’ 


£ = 10"'’ 


£ = 10"' 


£ = 10-« 


32 


6.025D-3 

1.498 


5.548D-3 

1.505 


5.398D-3 

1.507 


5.350D-3 

1.508 


5.335D-3 

1.509 


64 


2.133D-3 

1.589 


1.955D-3 

1.561 


1.899D-3 

1.552 


1.881D-3 

1.548 


1.875D-3 

1.547 


128 


7.090D-4 

1.646 


6.625D-4 

1.628 


6.477D-4 

1.621 


6.431D-4 

1.619 


6.416D-4 

1.619 


256 


2.266D-4 


2.144D-4 


2.105D-4 


2.093D-4 


2.089D-4 
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two tables, we deduce that the discretization scheme has order 2 in the subdo- 
main firi while in the corner layer the order is approximately 1.5, according to 
the theoretical results (see [2]). 

To see the efficiency of the new multigrid method, we compare the results 
with these ones obtained using the BI-CGSTAB method for the value e = 10“®. 
In Table 4 we show the iterations number and the CPU time of each one of 
these methods. ^From these results, we see that the methods are comparable for 
meshes with few points, but when the number of points increase, the multigrid 
method does not increase the number of iterations. Also the CPU time increases 
linearly for multigrid and more rapidly for the BI-CGSTAB method. 



Table 4. Number of iterations and CPU time for e = 10 ® 



N 


32 X 32 


64 X 64 


128 X 128 


256 X 256 


512 X 512 


BI-CGSTAB 


9(0.19) 


12(0.80) 


21(4.85) 


39(37.36) 


123(403.64) 


MULTIGRID 


3(0.4) 


3(1.44) 


4 (7.49) 


4 (29.88) 


5 (148.26) 
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Abstract. The effective use of the cache memories of the processors is 
a key component of obtaining high performance algorithms and codes, 
including here algorithms and codes for parallel computers with shared 
and distributed memories. The recursive algorithms seem to be a tool 
for such an action. Unfortunately, worldwide used programming language 
FORTRAN 77 does not allow explicit recursion. 

The paper presents a recursive version of LU factorization algorithm 
for general matrices using FORTRAN 90. FORTRAN 90 allows writing 
recursive procedures and the recursion is automatic as it is a duty of 
the compiler. Usually, recursion speeds up the algorithms. The recursive 
versions reported in the paper are some modification of the LAPACK al- 
gorithms and they transform some basic linear algebra operations from 
BLAS level 2 to BLAS level 3. 

Keywords: numerical linear algebra, recursive algorithms, FORTRAN 
90, LU factorization 

AMS Subject Classifications: 65F05, 65Y10 



1 Introduction 

The data flow from the memory to the computational units is the most critical 
part in the problem of constructing high-speed algorithms. The functional units 
have to work very close to their peak capacity. The registers (very high-speed 
memory) communicate directly with a small, very fast cache memory. This 
memory is a form of storage that is automatically filled and emptied according 
to a fixed scheme defined by the hardware system. The cache memory is a buffer 
between the processor and the main memory. It is many times faster than the 
main memory. Therefore, the effective use of the cache memory is a key compo- 
nent in designing high-performance numerical algorithms [3] . One way for solving 
this problem is to use recursive algorithms. Unfortunately, the worldwide used 
programming language FORTRAN 77 does not allow explicit recursion and writ- 
ing recursive algorithms using this language is a very difficult task. FORTRAN 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 325—332, 2001. 
@ Springer- Verlag Berlin Heidelberg 2001 



326 K. Georgiev and J. Wasniewski 



90/95 support recursion as a language feature [6]. Recursion leads to automatic 
variable blocking for linear algebra problems with dense coefficient matrices [5] . 
The algorithms reported in this paper are some modifications of well known LA- 
PACK algorithms [1] where BLAS level 2 version subroutines are transformed 
into level 3. The rest of the paper is organized as follows. Section 2 describes the 
recursive version of the LU factorization algorithm. In section 3 and Sectiond 
the recursive versions of the subroutines for matrix-matrix multiplication and 
solving systems of linear equations with triangular coefficient matrices which are 
needed inside the LU recursive algorithm are presented. 

2 Recursively Partitioned LU Factorization 

The algorithm factors an m x n matrix A into an m x n lower trapezoidal 
matrix L (upper triangle part is all zeros) with I's on the main diagonal and an 
n X n upper triangular matrix U in the case m > n (Fig.l), and into an m x m 
lower triangular matrix L with entries Ps on the main diagonal and an m x n 
upper trapezoidal matrix U (lower triangular part is all zeros) in the case m < n 

(Fig. 2). 

p n-p n p n-p 



P 



m-p 



P n-p 

A = L * U 

Fig. 1. Partitioning of the matrices in the case m > n 

Let the matrix A be divided into four blocks (1) (see also Fig.l and Fig. 2) 
and p = [min(m, n)/2] 

/ All Ai2 \ _ / Lii 0 \ ^ Ui2\ _ / LiiUii LiiUi2 \ 

\A21 A22 J \L21 L22 ) \ 0 U22) \L21U11 L21U12 + L22U22 ) 

In order to obtain the entries of the matrices L and U the following four 
subproblems have to be solved: 




LiiUii — All 

L11U12 = Ai 2 



(2) 

(3) 
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P n-p 



P n-p 



A„ 


A 12 


A 

21 


A 

22 





A = L * U 

Fig. 2. Partitioning of the matrices in the case m < n 



L21U11 


II 

to 

II 




- 

— ^21 


(4) 


L21U12 -F L22U22 


— A22 — 


L22U22 


= A22 — L21U12 


(5) 



The sizes of the submatrices in the case m> n are as follows: An, Ln and Un 
a,re pxp matrices, A21 and L21 are {m—p) xp matrices, A12 and U12 are px {n—p) 
matrices, A22 and L 22 are {m—p) x {n—p) matrices and U22 is an {n-p) x {n-p) 
matrix. In the other case, m < n the sizes of the submatrices are as follows: 
for All, Lii and Un are pxp matrices, A21 and L 21 are {m—p) xp matrices. An 
and U12 aiepx {n—p) matrices, A22 and U22 are {m—p) x {n—p) matrices and L22 
is an (m — p) x {m — p) matrix. There are standard LAPACK (GETRF) [1,4] 
and BLAS (TRSM, GEMM) [2] subroutines for solving these problems. Following 
the main idea, i.e. to go to the cache memory, recursive versions of them will be 
used here. The recursive algorithms for matrix-matrix multiplications and solving 
systems of linear equations with triangular (lower or upper) coefficient matrices 
will be described in the next sections. The corresponding recursive algorithms 
and subroutines are RGETRF, RTRSM and RGEMM, respectively. RGETRF 
is used for solving (2) and (5). RTRSM is used to problems (3) and (4) while 
RGEMM is used to obtain the right-hand side of (5). One can find bellow the 
high-level description of the recursive LU-factorization algorithm. 

RECURSIVE SUBROUTINE RGETRF ( A, IPIV, INFO ) 

! Use Statements: 

USE LA_PRECISION, ONLY: WP => DP 

USE LA_AUXM0D, ONLY: ERINFO, LSAME 

USE F90_RCF, ONLY: RLUGETRF => RGETRF, RTRSM, RGEMM 

USE F77_LAPACK, ONLY: GETRF_F77 => LA_GETRF 
! Purpose: 

! RGETRF computes an LU factorization of a general M-by-N matrix 
! A using partial pivoting with row interchanges. 

! The factorization has the form 
! A = P * L * U 

! where P is a permutation matrix, L is lower triangular with unit 
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! diagonal elements (lower trapezoidal if M > N) , and U is upper 
! triangular (upper trapezoidal if M < N) . 

! This is the right-looking Level 3 BLAS version of the algorithm. 

! Other subroutines used: RGEMM, RTRSM, DLASWP 

! Remark: The parameter N_CASH shows how many double precision 
! real numbers can be put in the "cache memory" 

M = S1ZE(A,1) 

N = S1ZE(A,2) 

MN_M1N = M1N(M,N) 

MEMORY = M*N 

1F( MEMORY <= N_CASH .OR. MN_M1N == 1) THEN 
! Call the standard Fortran ’90 routine LA_DGETRF 
ELSE 

P = MN_MlN/2 

CALL RLUGETRF( A(1:P,1:P), 1P1V=L1P1V, 1NF0=L1NF0 ) 

MN_L0C = M1N(N-P,P) 

CALL DLASWP(N-P, A( 1 : P ,P+1 : N) , P, 1, MN_L0C, LIPIV, 1) 

CALL RTRSM(A(1:P,1:P) ,A(1 :P,P+1 :N) ,UPL0=’L’ ,S1DE=’L’ ,D1AG=’U’) 
CALL RTRSM(A(1:P,1:P) , A(P+1 :M, 1 :P) , UPL0=’U’, S1DE=’R’ ) 

CALL RGEMM(A(P+1:M,1:P) , A(1 :P,P+1 :N) , A(P+1 :M,P+1 :N) , & 
ALPHA=-0NE, CASH=N_CASH ) 

CALL RLUGETRF(A(P+1:M,P+1:N) , 1P1V=L1P1V, 1NF0=L1NF0 ) 

MN_L0C = M1N(P,M-P) 

CALL DLASWP(P, A (P+1 : M, 1 : P) , M-P, 1, MN_L0C, LlPlV, 1) 

END IF 

END SUBROUTINE RGETRF 

3 RGEMM: A Recursive Algorithm for Matrix-Matrix 
Multiplication 

RGEMM is a recursive version of the BLAS routine GEMM. RGEMM performs 
in recursive way one of the following operations: 



C := a* op{A) * op{B) + P * C, (6) 

where op{X) = X or op{X) = X'^ . Here, op{A) is an M x AT matrix, op{B) is 
a K X N matrix and C is an M x N matrix, a and /3 are scalars. Since we can 
perform the following three types of actions: 



C = pc + aAB 
C = pc + aAB'^ 
C = pc + aA^B 



(7) 

(8) 
(9) 
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If the dimensions of the arrays is large enough to be put in the cache memory of 
the processor then we divide the matrices A, B and C into four by four blocks 
as follows. If rimin = min(m, fc,n) then p = [nmm/2] and in the case (7) of the 
above mentioned types of actions the dimensions of the blocks are: 

Aii{p X p), Ai2{p X k-p), A2i{m-p x p), A22{m-p x k - p) 

Bii{p X p), Bi2{p X n-p), B2i{k- p x p), B22{k - p xn-p) 

Cii{p X p), Ci2{p xn-p), C2i{m-p xn-p), C22{m - p xn-p). 

In the case (8) the dimensions are: 

Aii{p X p), Ai 2 {p X k-p), A2i{m-p x p), A22{m-p x k - p) 

Bii{px p), Bi 2 {px k-p), B 2 i{n-px p), B 22 {n-px k-p) 

Cii{p X p), Ci2{p xn-p), C2i{m-p xn-p), C22{rn - p xn-p). 

And in the case (9) the dimensions are: 

Aii{p X p), Ai2{p xm-p), A2i{k-p x p), A22{k- p xm-p) 

Bii{p X p), Bi2{p X n- p), B2i{k - p X p), B22{k - p x n - p) 

Cii{p X p), Ci2{p xn-p), C2i{m-p xn-p), C22{m - p xn-p). 

It is well seen that these formulaes lead to eight new problems of the same type 
but with matrices with smaller dimensions. In the case (7) they are: 

C'li = f 3 Cii + a{AiiBii + A12B21) C12 = pCi 2 + a{AnBi 2 + A12B22) 

C21 = PC21 + o:(A 2 ii?ii + A22B21) C22 = /3C22 + Q:(A 2 ii?i 2 + A22B22) 

In the other two cases the formulaes are similar. Therefore, we have eight 
recursive calls to the same algorithm. When the size of the blocks becomes small 
enough then the standard Fortran 90 subroutine GEMM is used to solve the 
problem with matrices have being stored in the cache memory of the processor. 
One can find bellow the high-level description of the RGEMM. 

RECURSIVE SUBROUTINE RGEMM ( A, B, C, ALPHA, BETA, TRA, TRB, CASH) 

! Use Statements : 

USE F90_BLAS, ONLY: LA_GEMM 

USE F90_RCF, ONLY: RCFGEMM => RGEMM 

! Other parameters: 

! TRA and TRB - specify the operation to be performed 

! TRA = ’N’ => op( A ) = A, TRA = ’T’ => op( A ) = k’ 

! TRB = ’N’ => op( B ) = B, TRB = ’T’ => op( B ) = B’ 

! Other subroutines used: DSYRK_90, RGEMM, ERINFO 

IF( LSAMECLTRA, ’NO ) THEN 

M = SIZE(A,1); K = SIZE(A,2) 

ELSE 

M = SIZE(A,2); K = SIZE(A,1) 

END IF 

IF( LSAMECLTRB, ’NO ) THEN 
N = SIZE(B,2) 

ELSE 

N = SIZE(B,1) 

END IF 
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MEMORY = M*K + K*N + M*N; N_MIN = MIN(M,N,K) 

IF( MEMORY <= CASH .OR. N_MIN == 1) THEN 
! Call the standard Fortran ’90 routine LA_GEMM 

call LA_GEMM( A, B, C, TRA=LTRA , TRB=LTRB , ALPHA=LAL ) 

ELSE 

P = N_MlN/2 

1F( LSAME(LTRA, ’N’) .AND. LSAME(LTRB, ’N’ ) ) THEN 
call RCFGEMM(A(1:P,1:P) ,B(1:P,1:P) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(1:P,P+1:K) ,B(P+1:K,1:P) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(1:P,1:P) ,B(1:P,P+1:N) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(1:P,P+1:K) ,B(P+1:K,P+1:N) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(P+1:M,1:P) ,B(1:P,1:P) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(P+1:M,P+1:K) ,B(P+1:K,1:P) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(P+1:M,1:P) ,B(1:P,P+1:N) ,C(P+1:M,P+1:N) , ...) 
call RCFGEMM(A(P+1:M,P+1:K) ,B(P+1:K,P+1:N) ,C(P+1:M,P+1:N) , ...) 

END IF 

1F( LSAME(LTRA, ’N’) .AND. LSAME(LTRB, ’T’ ) ) THEN 
call RCFGEMM(A(1:P,1:P) ,B(1:P,1:P) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(1:P,P+1:K) ,B(1:P,P+1:K) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(1:P,1:P) ,B(P+1:N,1:P) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(1:P,P+1:K) ,B(P+1:N,P+1:K) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(P+1:M,1:P) ,B(1:P,1:P) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(P+1:M,P+1:K) ,B(1:P,P+1:K) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(P+1:M,1:P) ,B(P+1:N,1:P) ,C(P+1:M,P+1:N) , ...) 
call RCFGEMM(A(P+1:M,P+1:K) ,B(P+1:N,P+1:K) ,C(P+1:M,P+1:N) , ...) 

END IF 

1F( LSAME(LTRA, ’T’) .AND. LSAME(LTRB, ’N’ ) ) THEN 
call RCFGEMM(A(1:P,1:P) ,B(1:P,1:P) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(P+1:K,1:P) ,B(P+1:K,1:P) ,C(1:P,1:P) , ...) 
call RCFGEMM(A(1:P,1:P) ,B(1:P,P+1:N) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(P+1:K,1:P) ,B(P+1:K,P+1:N) ,C(1:P,P+1:N) , ...) 
call RCFGEMM(A(1:P,P+1:M) ,B(1:P,1:P) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(P+1:K,P+1:M) ,B(P+1:K,1:P) ,C(P+1:M,1:P) , ...) 
call RCFGEMM(A(1:P,P+1:M) ,B(1:P,P+1:N) ,C(P+1:M,P+1:N) , ...) 
call RCFGEMM(A(P+1:K,P+1:M) ,B(P+1:K,P+1:N) ,C(P+1:M,P+1:N) , ...) 

END IF 
END IF 
END IF 

END SUBROUTINE RGEMM 
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4 RTRSM: A Recursive Algorithm for Solving Systems of 
Linear Equations with Triangular Coefficient Matrices 

RTRSM is a recursive version of the BLAS routine TRSM. RTRSM solves sys- 
tems of linear equations with triangular (lower or upper) coefficient matrices, i.e 
one of the following operations 

op{A) X = aB or X op{A) = aB ( 10 ) 

in recursive way, where op{A) is an m x m triangular matrix (op{A) = A or 
op{A) = A^), a is a scalar, X and B are m x n matrices. 

Let p = [m/ 2 ] and for simplicity to look only at the first possible operation 
in ( 10 ), i.e. AX = aB. Then we divide the matrix A into four blocks: An(l : 
p,l : p),Ai2{l : p,m-p : m),A2i{p+ 1 : m, 1 : p),A22{p+ 1 : m,p+ 1 : m), 
the matrices X and B into two blocks: Ai(l : p,l ■ 'n),X2{p+ 1 : m, 1 : n) 
and i?i(l : p, 1 : n), B2{p+l : m, 1 : n). If A is a lower triangular matrix then An 
and A22 are lower triangular matrices too and A12 = 0 . If A is an upper triangular 
matrix then An and A22 are upper triangular matrices too and A21 = 0 . In 
both cases the block algorithm leads to two times using the same algorithm for 
solving systems with triangular coefficient matrices and ones using procedure for 
a matrix-matrix multiplication, i.e. using the recursive algorithm RGEMM (see 
Section 3 .). If A is a lower triangular matrix then the block algorithm is: 

AiiAi = Bi (RTRSM) 

B2 - A21A1 (RGEMM) 

A22A2 = B2 — A21A1 (RTRSM), 

and if A is an upper triangular matrix then: 

A22X2 = B2 (RTRSM) 

Bi - A12X2 (RGEMM) 

A21A1 = Bi- A12X2 (RTRSM). 

One can find bellow the high-level description of the RTRSM. 

RECURSIVE SUBROUTINE RTRSM ( A, B, ALPHA, UPLO, SIDE, TRANSA, DIAG) 

! Use Statements : 

USE F90_BLAS, ONLY: LA_GEMM, LA_TRSM 
USE F90_RCF, ONLY: RCFTRSM => RTRSM, RGEMM 

IF( LSAME(LUP, ’UO .AND.LSAME(LTRA, ’NO .OR. & 

LSAME(LUP, ’LO . AND . LSAME(LTRA , ’TO )THEN 



R1=P+1; 

ELSE 


R2=L; 


SI = 1 


; S2 = P 


S1=P+1; 


S2=L; 


R1 = 1 


; R2 = P; 



END IF 

MEMORY = M*M + 2*M*N 
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N_MIN = MIN(M,N) 

IF( MEMORY <= CASH .DR. N_MIN == 1) THEN 
! Call the standard Fortran ’90 routine LA_TRSM 

CALL LA_TRSM( A, B, LAL, LUP, LSIDE, LIRA, LDIAG) 

ELSEIFC LSAME (LSIDE, ’L’) ) THEN 

CALL RCFTRSM(A(R1:R2,R1:R2) ,B (R1 : R2 , 1 : N) ,LAL,LUP,LS1DE,LTRA, 
LDIAG) 

1F( LSAME(LTRA, ’N’) )THEN 

CALL RGEMM(A(S1:S2,R1:R2) ,B(R1 :R2, 1 :N) ,B(S1 : S2 , 1 : N) ,AL=-1.0, . . .) 
ELSE 

CALL RGEMM(A(R1:R2,S1:S2) ,B(R1 :R2, 1 :N) ,B(S1 : S2 , 1 : N) ,AL=-1.0, . . .) 
END IF 

CALL RCFTRSM (A (SI : S2 , SI : S2) , B (SI : S2 , 1 : N) , LAL , LUP , LSIDE , LTRA , LDIAG) 
ELSE 

CALL RCFTRSM ( A (S 1 : S2 , S 1 : S2) , B ( 1 : m , S 1 : S2 ) , LAL , LUP , LSIDE , LTRA , LDIAG) 
IF ( LSAME ( LTRA , ’ N ’ ) ) THEN 

CALL RGEMM(B(1:M,S1:S2) , A(S1 : S2 ,R1 : R2) ,B(1 :M,R1 :R2) ,AL=-1.0, . . .) 
ELSE 

CALL RGEMM(B(1:M,S1:S2) , A(R1 : R2 ,S1 : S2) ,B(1 :M,R1 :R2) ,AL=-1.0, . . .) 
END IF 

CALL RCFTRSM ( A (R1 : R2 , R1 : R2) , B ( 1 : M , R1 : R2 ) , LAL , LUP , LSIDE , LTRA , LDIAG) 
END IF 

END SUBROUTINE RTRSM 
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Abstract. In LAPACK we have two types of subroutines for solving 
problems with symmetric matrices: with full and packed storage. The 
performance of the full storage scheme is much better because it allows 
the usage of BLAS Level 2 and 3, while the memory requirements for 
the packed scheme are about twice less. Recently a new storage scheme 
was proposed which combines the advantages of both schemes: it has a 
performance similar to that of full storage, and the memory requirements 
are a little bit higher than for packed storage. In this paper we apply the 
scheme for inversion of symmetric indefinite matrices. 



1 Introduction 

Nowadays performance of numerical algorithms depends significantly on the 
computer architecture. Modern processors have a hierarchical memory which, 
if utilized appropriately, can bring to several times better performance. 

One of the ways to use effectively the different levels of memory in the algo- 
rithms of numerical linear algebra is to introduce blocking in the algorithm. In 
this way effectively designed BLAS (Basic Linear Algebra Subroutines) [1995, p. 
140] can be used, and improve the performance essentially. This is the approach 
accepted in LAPAGK (Linear Algebra PAGKage) [1995]. In many algorithms of 
LAPAGK BLAS Level 3 (matrix-matrix operations) and Level 2 (matrix-vector 
operations) are used. 

In this work we consider the inversion of matrix A € 7?."^", where A is 
symmetric indefinite. The most popular algorithm for this problem uses the 
LDLA decomposition of matrix A with Bunch-Kaufman pivoting [1996, §4.4], 
[1996, §10.4.2]. There are two types of subroutines in LAPAGK implementing 

* This research is supported by the UNI»C collaboration with the IBM T.J. Watson 
Research Center at Yorktown Heights. The last author was partially supported by 
Grant 1-702/97 and Grant MM-707/97 from the Bulgarian Ministry of Education 
and Science. 
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this method. In the first one the matrix is stored in a two-dimensional array, 
and this is called full storage. For example, a 8 x 8 matrix is stored as follows: 



1 


* 


* 




* 






* 


2 


12 


* 




* 






* 


3 


13 


23 




* 






* 


4 


14 


24 


34 


* 






* 


5 


15 


25 


35 


45 






* 


6 


16 


26 


36 


46 


56 




* 


7 


17 


27 


37 


47 


57 


67 


* 


8 


18 


28 


38 


48 


58 


68 


78 


* 


* 


* 




* 


* 




* 


* 


* 


* 


* 


* 


* 


* 


* 



where the entries denoted by a star are not referenced by the algorithm. The 
upper triangle of the matrix is not kept because it is symmetric. The last two 
unreferenced rows are added to illustrate that one can choose a leading dimension 
of the two-dimensional array in order to achieve a good level 1 cache utilization 
(in the above example the leading dimesion is 10, and the order of the matrix is 
8). For simplicity, the entries are given integer values (the numbers of the places 
where the used elements are stored if the two-dimensional array is mapped to a 
one-dimensional array). This is enough to illustrate the two types of storage. 

Practical problems can be very large, and in this case memory is an important 
issue. Clearly in full storage we use about twice more memory than necessary. 
Therefore, a second type of storage has been designed which is called packed 
storage. With this type of storage we keep only the essential part of the matrix 
needed for the computations in a one-dimensional array as follows: 

29 ****** 

3 10 16 ***** 

4 11 17 22 * * * * 

5 12 18 23 27 * * * 

6 13 19 24 28 31 * * 

7 14 20 25 29 32 34 * 

8 15 21 26 30 33 35 36 

Clearly, packed storage needs about twice less memory than full storage. 

The disadvantage of full storage is that it uses more memory but its advantage 
is that it allows the usage of BLAS Level 3 and Level 2 calls which speeds up the 
computation essentially. For comparison, with packed storage we can use only 
BLAS Level 1. To illustrate this we present in Fig. 1 on the left performance 
results with random matrices for both types of storage. DSYTRI denotes the 
full storage code, and DSPTRI denotes the packed storage one. It is seen that 
the performance of DSYTRI is much better. Let us note that these experiments 
include the time for the LDL^ factorization of matrix A. 

We also measured the pure time for inversion only (not including the factor- 
ization part). The results are given in Fig. 1 on the right. It can be seen that 
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Fig. 1. Performance results for full (DSYTRI, solid line) and packed (DSPTRI, 
dotted line) storage from factorization plus inversion (left) and inversion only 
(right) for different sizes n of matrix A 



the performance of the inversion part only is even better for the packed storage 
format. The explanation is that both inversion codes use BLAS Levels 1 and 2 
only. So, the better performance of the whole algorithm comes from the usage 
of BLAS Level 3 in the factorization code _SYTRF. 

In the present work we use the proposed in [1999] new type of packed stor- 
age. The columns of the matrix are divided into blocks. The blocks are kept 
in packed storage, i. e. the blocks are stored successively in the memory. Then 
several successive columns of the matrix are kept inside each block as if they 
were in full storage. The result of this storage is that it allows the usage of 
BLAS Level 3. Of course, we need slightly more memory than for the _SPTRI 
storage scheme but this memory is about 3-5% more on average for problems 
of practical interest. Thus the new storage scheme combines the two advantages 
of the storage formats in LAPACK, the smaller size of the memory in _SPTRI, 
and the better performance of _SYTRI. 

Let us note that our storage scheme allows the usage of BLAS Level 3 in the 
inversion part of the algorithm. In _SPTRI and _SYTRI BLAS Levels 1 and 2 
are used. This also improves the performance of the whole algorithm essentially. 

The paper is organized as follows. In Sections 2 we present the so called 
overlapping scheme developed in [1999]. From [1999] one can see that this is the 
best scheme for problems involving symmetric indefinite matrices. In Section 3 
the block inversion algorithm is presented. Finally, in Section 4 we illustrate our 
results by numerical tests. 

2 The Block Rectangular Overlapping (BRO) Storage 
Scheme 

We assume that the LDL^ factorization with the Bunch-Kaufman pivoting is 
used in the factorization part of the algorithm. Thus we have 

PAP^ = LDLF = with W = LD, 
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where P is a permutation matrix, L is unit lower triangular, and D is block 
diagonal with 1 x 1, or 2 x 2 blocks. More detailed descriptions can be found 
in [1996, §4.4], [1996, §10.4.2]. The idea in [1999] is similar to the LAPACK 
_SYTRF algorithm. The columns of matrix A are split into blocks, each block 
having n;, columns. For simplicity, we assume that n is a multiple of rib- The 
results for the opposite case are the same. 

We will illustrate the BRO scheme by an example. With the BRO scheme 
we would have 



1 * * 
2 12 * 
3 13 23 




1 = 






4 14 24 

5 15 25 

6 16 26 

7 17 27 

8 18 28 
* * * 
* * * 


34 * * 

35 42 * 

36 43 50 




37 44 51 

38 45 52 
* * * 
* * * 


58 * 

59 61 
* * 
* * 



The elements denoted by a star are not referenced by the algorithm and inserted 
to allow blocking and a choice of a good leading dimension. For simplicity the 
values of the entries show the order of the elements in the one-dimensional array, 
where we store the matrix. This blocking scheme leads to BLAS Level 3 only. 
When reaching the boundary between two blocks we can have two situations: 
1 X 1, or 2 X 2 pivot. To better understand this let us consider the situation when 
reaching column 3 of the matrix we have a 2 x 2 pivot. We can see from (1) that 
A(5 : 8, 1 : 4) is stored in such a way that it can be accessed by BLAS Level 
3 without any problems. At the same time the block A(6 : 8,3 : 5) can be also 
accessed by BLAS Level 3 in case we have a 1 x 1 pivot at column 3. Thus, both 
situations are handled in a nice way. 

The total memory we need (without taking into account the leading dimes- 
nion) is estimated in [1999]: 

n{n + l)/2 -I- nrib + n — rib, (2) 

which in practice is slightly larger than the memory necessary for the LAPACK 
packed storage. 
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Storage 



The inversion code is based on the following fact. Let us first ignore the per- 
mutations in the matrix, and assume that A = LDL^ and that L and D are 
blocked as follows: 

'M Q\ Q 

N Q r ^ \ Q D2 



L = 



Inversion of Symmetric Matrices in a New Block Packed Storage 



337 



where M and Q are unit lower triangular (not of equal size in general). Then 
for the inverse A~^ we have 

J-i) , 

where 

= M-'^DY^M~\ A2^ = Q~^D2^Q~\ 

W = NM~^, Y = -A2^W. 

Finally, we apply the permutations stored in matrix P to rows and columns of 
matrix A~^. For simlicity let us denote the permuted matrix by A~^ again. 

Now assume that A^^ is already computed and stored in Q. Then the algo- 
rithm for computing A~^ is given below. In order to show how much memory 
we need we use the entries M, N, Q to store the results. We give also in brackets 
the corresponding BLAS or LAPACK routine. A working array W is introduced 
because it is necessary for the computation. 

Step 1. Copy N to W (.COPY). 

Step 2. Compute W = WM~^ (.TRSM). 

Step 3. Compute N = QW (_SYMM). 

Step 4. Compute M = M~^ (_SYTRI). 

Step 5. Compute M = M + W'^N (_GEMM). 

Step 6. Apply permutations in P to A~^. 

This scheme is simple to implement. We have a few BLAS calls and one 
LAPACK call. Let us point out some of the advantages of this scheme: 

— Mostly BLAS Level 3 calls are used (the only exception is the call to _SYTRI 
inside which Level 1 and 2 calls are used); 

— We do not need additional memory for the working array W. The same 
storage is necessary for the factorization part, and we have already allocated 
this storage; 

— In the present implementation _GEMM is used at Step 5, and the algorithm 
can be several times faster than _SYTRI, and _SPTRI (see the following 
section). But matrix M at Step 5 is symmetric, and a special routine can be 
written for this operation which takes into account symmetry. Thus the flop 
count for this operation only can be reduced about twice. Such a routine is 
not present in BLAS now, and we use _GEMM in the numerical experiments. 

— During the factorization part instead of the diagonal matrix D we keep its 
inverse D~^. The reason is that when solving a system of linear equations 
or inverting a matrix we need D~^ only. This improves slightly the overall 
performace as well. 

We presented only one block step of the whole algorithm. Repeating this step 
recursively we get the whole algorithm. The advantages given above lead to a 
better performance which is illustrated in the next section. 
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4 Numerical Tests 

The tests are done in Fortran 77 on an IBM 4-way SMP node with PowerPC 
604e 332 MHz CPUs. The matrices are generated randomly with a generator 
which produces uniformly distributed in [0,1] numbers. 

We compare the performance (in Mflops) of three algorithms: DSYTRI and 
DSPTRI from LAPACK, and DBSTRI (the algorithm with the packed BRO 
storage). In Figs. 2-3 we present tests on the SMP node with 1 CPU, and the 
SMP node with 4 CPUs, respectively. In Fig. 4 we show also the the speedup of 
all three algorithms on 4 processors. The results show that with the new packed 
storage scheme 

— the performance is several times better than the performance of the LAPACK 
packed storage routine DSPTRI while using slightly more amount of memory, 
and the same number of flops; 

— moreover, the performance is up to 2-3 times better than the performance 
of the LAPACK full storage routine DSYTRI which uses about twice more 
memory; 

— the speedup of the studied in this paper algorithm is larger than the speedups 
of the two LAPACK routines. This means that our algorithm is better suited 
for parallel architectures. 




Fig. 2. Performance results for full (DSYSV, solid line), packed (DSPSV, dotted 
line), and BRO storage (dashed line) from factorization plus inversion (left) and 
inversion only (right) for different sizes n of matrix A on an IBM SMP node 
with 1 CPU 



Finally, in Fig. 5 we show the memory requirements for different values of n. 
For this purpose we use the expressions 

v? + n + nub, ri^ + n 



for full (_SYSV) and packed (.SPSV) storage, respectively, and (2) for the BRO 
storage. In _SYTRF there is also a buffer with nt, columns. Therefore, the term 
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Fig. 3. Performance results for full (DSYSV, solid line), packed (DSPSV, dotted 
line), and BRO storage (dashed line) from factorization plus inversion (left) and 
inversion only (right) for different sizes n of matrix A on an IBM SMP node 
with 4 CPUs 




Fig. 4. Speedup of full (DSYSV, solid line), packed (DSPSV, dotted line), and 
BRO storage (dashed line) from factorization plus inversion (left) and inversion 
only (right) for different sizes n of matrix A on an IBM SMP node with 4 CPUs 



nrib appears. The value of nt is chosen from practical experience, so that we 
have almost best performance. We see that the memory requirements for the 
BRO scheme are much closer to the _SPSV packed storage than to the .SYSV 
full storage. 
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Fig. 5. Memory requirements for full (DSYSV, solid line), packed (DSPSV, dot- 
ted line), and BRO storage (dashed line) 
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Abstract. The stability with respect to initial data of difference schemes 
with operator weights is investigated in the frameworks of the general 
stability theory of operator-difference schemes. The stability is defined as 
the existence of a selfadjoint positive operator which determines the time- 
nonincreasing norm of the difference solution. The norm-independent 
stability criterions are obtained in the form of operator inequalities. 
The notation of stability boundary in the plane of two grid parame- 
ters is introduced for multi-parameter difference schemes which approx- 
imate two-dimensional parabolic and hyperbolic differential equations. 
The noniterative numerical algorithm is suggested for the construction 
of the stability boundaries of difference schemes with variable weighting 
factors. The approach is based on finding the smallest eigenvalue of an 
auxiliary selfadjoint eigenvalue problem. 



1 Introduction 

The stability theory for time-dependent operator-difference schemes in Hilbert 
spaces has been suggested originally by A. A. Samarskii [1,2] and was developed 
in numerous papers (see, e.g., [3] — [6] and references therein). 

The stability is referred as the existence of a selfadjoint positive operator 
that determines the norm in the grid space nonincreasing on the solution of 
difference problem. This norm is connected as a rule, with the difference schemes 
under question. Such obtained stability conditions are criterions for stability in 
this prescribed norm, but it turns out being only rough sufficient conditions 
for stability in other norms. The set of the so-called symmetrizable difference 
schemes was singled out in [7,8] for which the theorems about norm-independent 
necessary and sufficient stability conditions were obtained. 

In the present paper the two-layer and symmetrical three-layer difference 
schemes with operator weight multipliers are studied. We will consider operator 
equations 

— + (jAyn+i + {I - cr)Ayn = 0, n = 0, 1, . . ., yo specified, (1) 

T 

+ TJn-i fjAyn +1 + {I - 2(j)Ayn + crAyn-i = 0 , ( 2 ) 

n = 1, 2, . . ., j/o and j/i specified, where = y{tn) S 77 is an unknown 
element of the Euclidean space H, tn = tit, r > 0, A and a are linear operators 
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in H, I is the identity operator. We suppose that operators A and a are n- 
independent. Let {y,v) be an inner product in H, ||?/|| = is the norm 

in H and D : H ^ H is a, selfadjoint positive operator. Denote hy Hd the 
space H with norm \\y\\D = yJ{Dy,y). 

The difference scheme (1) is called stable in the space Ho (or in the norm D), 
if inequalities || 2 /„+i||£) < ||2/n||r) hold for the solution of the problem (1) for 
arbitrary yo G H and n = 0, 1, . . . 

The stability definition for the three-layer difference scheme (2) is similar. 



2 Stability Theorems 

The theory of difference schemes with operator weight multipliers has its own 
features that are expounded in detail in the book [5]. 

We quote here the theorem on necessary and sufficient stability conditions 
of scheme (1) obtained in [7]. Preliminarily we represent the scheme (1) in the 
canonical form 

^yn+i — ^ n = 0, 1, . . ., yo specified, (3) 

T 

where B = I + raA. 

Theorem 1. Suppose that A* = A, a* = a and the operators A~^ and B~^ = 
(/ -|- raA)~^ exist. If the scheme (1) is stable in a space Ho, then the operator 
inequality 

A ~\~ rAfiA > 0 (4) 

holds, where y = a — 0,5/. Conversely, if (4) is fulfilled, then the difference 
scheme (1) is stable in Hj^ 2 . 

Proof. Multiplying (3) by the operator A we have the equivalent equation 

j^Vn+f Vrt^ _|_ n = 0, 1, . . ., yo specified, (5) 

T 

where A = A^ and B = I + rAaA are selfadjoint operators, A > 0. We have the 
following theorem (see [4] ). 

Suppose that the operators A and B are selfadjoint and do not depend on n, 
and the operator A is positive. If scheme (5) is stable in a space Ho, then these 
operators satisfy the inequality 

B > 0.5ri. (6) 

Conversely, if (6) is satisfied, then scheme (5) is stable in Hjyz. 

In our case the inequality (6) has the form 

/ -|- rAaA > 0.5tA^, 



and coincides with (4). 
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A similar result is valid for the symmetrical three-layer scheme (2). 

Theorem 2. Let A* = A, a* = a and suppose that A~^ and (/ -I- r^aA)~^ 
exist. If the scheme (2) is stable in a space Ho, then the operator inequality 

t^A (t^A)pl{t^A) > 0, (7) 



holds, where fi = a — 0,25/. Conversely, if the inequality 

t^A + A) A) > 0 , 



(8) 



is fulfilled, then the difference scheme (2) is stable in the norm 



\yn\\D = 



Vn + Vn-l 



1/2 



Vn Vn—l 



A2 



A-\-r'^AfiA ! 



Proof. By (2) we have the equivalent equation 

2yn + yn-i _ 2a)Ay„ + AaAy^-i = 0, 

which can be represented in the canonical form 

T'^Rytt.n + Ayn = 0 , 



(9) 



where yit,n = {yn+i ~ 2i/„ -b A = A^ and E = t ^A + AaA. By 

a general stability theorem (A. Samarskii [2] ), the condition (8) is sufficient 
for the stability of the scheme (2) in the norm above mentioned. Let us prove 
the necessity of condition (7). Let us represent the three-layer scheme (9) as 
an equivalent two-layer scheme Vn+i = SY^, where = {yn-i y-n)'^ , and S = 
(Sa/s) is the matrix with elements S'!! = 0, 512 = /, ^21 = —I, S 22 = 21 — R~^A. 
Let s be an arbitrary eigenvalue of the matrix S and y = {y^i is the 

corresponding eigenvector. The eigenvalue problem Sy = sy can be reduced to 
the quadratic problem ((1 — s)^A-|- sA)y^^^ = 0 or, more in detail, 

((1 — s)^(t~^A + AaA) + sA'^)y^^^ = 0. (10) 



After transforming the equation (10) in the form 

A((l — s)^(r“^A“^ -b cr) -b sI)Ay^^'> = 0 
and denoting = AyAi ^ we have the eigenvalue problem 

(r“^A“^ -b cr)a;^^^ = 

where A = — s/(l — s)^. Note, that s = 1 is not the eigenvalue of (10), by the 
assumption that A~^ exists. Further, A is a real number because t~‘^A~^ -b ct is 
the selfadjoint operator. If the scheme (9) is stable in a space, then |s| < 1 and, 
consequently, A > 0, 25. Fulfilment of inequalities Xk > 0, 25 for all eigenvalues 
Afc of the selfadjoint operator A~^ + a is equivalent to the operator inequality 

t~^A~^ + cr > 0, 251, 



which is equivalent to (7). 
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3 Stability Boundaries of Two-Dimensional Difference 
Schemes 

Before the general discussion we shall treat the following example. Let us consider 
the first boundary-value problem for the heat conduction equation 



du d'^u d'^u 



( 11 ) 



in the rectangle 0 < < h, 0 < X 2 < h with zero boundary conditions and 

arbitrary initial condition. 

Let us introduce the timestep r > 0 and the spacesteps hi and /12 in the 
directions X\ and X 2 respectively. Let us denote 



Vxixi, ij 





ihi, 


j = 0 , 


1, 




hiNi = 


lu 


™L) _ 
0^2 — 


jh2, 


J = 0, 


1, 


• ■ • ,1V2, 


h2^2 = 


h, 


nr, n 


= 0, 


1 ,..., 


r 


>0, y” 


= y{xi^ 


Jj) f \ 

, ^2 , i'nj 5 


Vi-ij 


-2y5 


+ Vi+1] 




7/^ 


2/S-i - 


- ‘^y?j + y?j+i 




h\ 




? 


Ux2X2,ij 




h\ 



_ yn 

y - = ^ yV - = 



^hV7j = Vs.xuij + V: 
yfj - % 



IJ t^XiXi,lJ ' ^X2X2,IJ^ 

n — 1 ^,n+l 



ytt,ij = 



ylT - 2y” + Vij 



Let O’ be a real number and suppose that h = ^2 = 1- Let us approximate 
the original differential problem by the difference one, 



= aA,,y^+^ + (1 - 

z= j = l,2,...,iV2-l, n = 0,l,..., 

y?n\r^ = 0, y°,- = 



(12) 



h '-'5 UlJ 

where Fh is the grid boundary and mo(x^*\ is the specified initial value. 
It is well known and it follows from Theorem 1 that the inequality 



1 

(T > 



1 



2 rAn 



(13) 



is the necessary and sufficient condition for the stability of the difference scheme 
(12) with respect to initial data . Here 



7T 4 2 

— Q h -r^T COS ^ _ 

2iN-] ^N‘2 



\ ^ 2 2 

^max — COS 

‘'I '^2 
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is the largest eigenvalue of the five-point difference Laplace operator Ah- The 
stability condition ( 13 ) can be rewritten as follows: 



If /r > 0 , then the scheme ( 12 ) is stable for all 71 and 72. In case /i < 0 , the 
stability condition assume the form 



Thus, in the first quadrant of the 71072-plane the stability boundary exists, 
which constitutes the segment of straight line 



The scheme ( 12 ) is unstable in every point (71, 72) of the first quadrant, which 
is situated above this straight line, and it is stable in each underlying point. One 
can see from ( 15 ) that the stability domain enlarges when a increases, and it 
coincides with the entire first quadrant for a = 0 , 5 . 

It follows from expressions for a\ and 02 that the stability boundary weakly 
depends on iVi and A^2- 

4 Numerical Construction of Stability Boundaries for 
Difference Schemes with Variable Weights 

The theorems above mentioned enable us to carried out the numerical stability 
investigation of difference schemes which approximate the mathematical physics 
problems and to construct the stability boundaries for such schemes. 

Let us consider the problem of numerical construction of stability boundary 
for the multiparameter family of difference schemes 



4^(7101 -I- 7202) -I- 1 > 0, 



where /r = cr — 0 , 5 and 



r r 



71 = -^, l2 = j^, ai = cos- 



2 iVi’ 



fl2 = COS 



2N2 




(14) 



7 i,o 72,0 



where 



1 1 



(15) 



4(0,5-cr)ai’ 4(0,5-cr)a2' 




(16) 



7 i.o 72.0 




+ (1 - 



n 



T 




( 17 ) 
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Here (7^ are given real numbers (weights) and A is a five-point difference oper- 
ator, namely A=Ai + A2, where 



/■ A \n ^ I yi: 

{Aiy)rj - ( a,+i, 



- fli 






hi 



(^22/)” 



1 

h-2 




y"j+i - yt 



, y"j - y"j-i \ 

h2 )' 



We suppose that Uij > ci > 0 , bij > C2 > 0 for all j. The equation ( 17 ) 
depends on two grid parameter, 71 = r//if and 72 = which determines 

along with a = (cry ) whether the difference scheme is stable or not. 

Let the set a = (cry) be settled. We assume further that weighting factors 
fJij are independent of 71 and 72 . We shall accept the following terminology. The 
difference scheme ( 17 ) is said to be stable in the point (71,72) of 71072-plane, if 
it is stable for these values 71 and 72. We assume further that 71 > 0 , 72 > 0 . 
Let us call by stability domain of the difference scheme ( 17 ) the set of all points 
in the first quadrant where the scheme is stable. Similarly instability domain 
consists of all the other points of the first quadrant. 

The curve in the first quadrant 71 > 0 , 72 > 0 , which separates stability and 
instability domains is said to be the stability boundary of the difference scheme 
( 17 ). The scheme is referred to as absolutely stable if it stable for all points 
7i > 0 , 72 > 0 . Thus, the stability boundary does not exist for absolutely stable 
schemes. 

For the three-layer difference scheme 






= + (1 - 2ay)Hy- + a,,Ay-~^ 



( 18 ) 



the notation of stability boundary is introduced the same way, but here 71 = 
/h\ and 72 = jh"^. 

On the basis of the method of numerical construction of stability boundaries 
we will put the transferring from Cartesian coordinate (71,72) to the polar one 
(r, Lp) and the search of the point of stability boundary on the ray ip = const. 
We first consider the two-layer difference scheme ( 17 ). It can be represented in 
the form ( 1 ), where A = —A. It is well known that tA is a symmetrical and 
positively defined matrix. By theorem 1 the scheme ( 17 ) is stable if and only if 
all the eigenvalues of stability matrix P = (tA) -|- (rA)/i(rA) are nonnegative. 
The sought parameters 71 and 72 are included only in matrix tA and are not 
contained in matrix p. The matrix tA has the form 



rH = 7 iAi -I- 72A2, ( 19 ) 

where 71 = r//if, 72 = Ai = — /ifTi, A2 = — /ifH2- It is important to 

note that Ai and A2 are independent of the grid parameters 71, 72. 

Setting 7i = r cos ip, 72 = rsin</j, we have from ( 19 ), that tA = rA^, where 

A;p = Ai cos ip + A2 sirup. ( 20 ) 
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Thus, the sought parameter r is included in the matrix tA as a numerical factor, 
and the matrix A,^ is independent of r. Note that matrix A^p differed from tA 
only by a positive factor and therefore A^ is a symmetrical positively definite 
matrix. The stability matrix P = (tA) + (tA)^(tA) can be written in the form 
P = rPp, where 

Pp = A^ + rA^fiAp. ( 21 ) 

The search of stability boundary is based on the following characteristics of 
the matrix P^. 

Lemma 1. The difference scheme (17) is stable in the point 71 = r cos ip, 72 = 
rsin(/j if and only if all the eigenvalues of matrix P^, are nonnegative. 

Let us consider the generalized eigenvalue problem 

ApfiApX = XApX. ( 22 ) 

It turned out, that the property of having fixed sign for the spectrum of the 
problem ( 22 ) is independent of ip. 

Lemma 2. The smallest eigenvalue Amin of (22) is nonnegative if and only if 
^ 0 for all i and j. 

Corollary 1. If all cjij > 0,5, then the scheme (17) is absolutely stable. 

In the next statement the coordinates of the point of stability boundary is 
indicated, which is situated on the ray 72 = 71 tan ip. 

Theorem 3. Let at least one weighting factor aij < 0,5. Then for all ip the 
smallest eigenvalue Amin(‘ 7 ’) of the problem (22) is negative. The point 71 = 
r cos ip, 72 = rsin(/j is situated on the stability boundary of difference scheme 
(17) if and only if r = -l/XmiffT)- 

Corollary 2. If at least one weighting factor aij < 0,5, then the stability bound- 
ary r = r{ip) exists, which is one-valued function of ip. 

Thus, for constructing a stability boundary of scheme (17) in the case, when 
at least one weighting factor is less than 0,5, it is sufficient for all ip G [0, tt/ 2] to 
find the smallest eigenvalue Amin = Amin(</5) of the problem ( 22 ) and to set 

r = -1/Amin(7^), li=rcosip, 72 = rsin(/?. 

In the case of three-layer scheme (18) all statements formulated above change 
slightly. It is necessary only to replace the condition aij > 0, 5 by (Tij > 0, 25 and 
the condition aij < 0, 5 by tTy < 0, 25. Besides, in the case of three-layer scheme 
we have 71 = jh\ and 72 = /h\. The numerical algorithm for stability 

boundary constructing does not change. 
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5 Support Points and the Basic Straight Line 

The computations, which have been performed in [9], demonstrate, that as a 
rule, the stability boundary is slightly differed from the segment of a straight 
line, determinated by parameters aij. This straight line crosses the stability 
boundary in the points (710, 0) and (0, 720)- Such a line we call the basic straight 
line, and the points (710,0) and (0,720) the support points. 

It is possible to construct support points according to the 2-dimensional al- 
gorithm above described. But in this case the algorithm can be essentially sim- 
plified and reduced to find the stability boundary of one-dimensional problems. 
Let us consider for example how to find the point (710, 0). In this case the polar 
angle ip = 0 and matrices A^p = A\ and A^pAp, including in the main equation 
(22), assume a block diagonal form, namely AifiAi = diag[Mi, M2, . . . , MiVa-i], 
where Mj are symmetrical matrices of order — 1. Therefore the spectrum of 
the problem (22) is the union of the spectra of iV2 — 1 matrices of order — 1. 
Respectively, the j’s diagonal block Pj of stability matrix (21) contains only 
elements of the j's line 

(jL) = (cr^j, CT2j, ■ • ■ , CFNi-lj) 



of matrix a. 

Thus, the smallest eigenvalue of matrix Pj determines the stability boundary 
7i(j) of an one-dimensional problem with weights distribution The stability 
boundary of the two-dimensional problem can be found as follows: 

So, the minimal number from all 71 (j) corresponds to the worst one- 
dimensional variant along lines (notations of the optimal and the worst variants 
were introduced in [10]). It follows that the support point (710, 0) is independent 
of lines permutation in the matrix tr. In exactly the same way, the support point 
(0, 720) is determined by the worst variant along columns of the matrix tr and is 
independent of columns permutation. 

Supported by the Russian Foundation for Basic Research (grant No 99-01- 
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Abstract. The central difference scheme for reaction-diffusion prob- 
lems, when fitted Shishkin type meshes are used, gives uniformly conver- 
gent methods of almost second order. In this work, we construct HOC 
(High Order Compact) compact monotone finite difference schemes, de- 
fined on a priori Shishkin meshes, uniformly convergent with respect the 
diffusion parameter e, which have order three and four except for a loga- 
rithmic factor. We show some numerical experiments which support the 
theoretical results. 



1 Introduction 

In this paper we are interested in the construction of high order compact schemes 
to solve problems of type 

Lg [u] = —eu” + b{x)u = /(a:), 0 < a; < 1, u{0) = uq , m(1) = ui, (1) 

where 0 < e < 1, uq and ui are given constants and we suppose that b and / 
are sufficiently smooth functions with b(x) > f3 > 0, x € [0,1]. The exact so- 
lution of (1) satisfies |u^^^(a:)| < (1 -I- e(x, a:, /3, e)) , where e{x,y,P,e) = 

exp i//3/e a:^ -|-exp (1 — t/)^ . For sufficiently small £, classical meth- 

ods on uniform meshes only work for very large number of mesh points. Neverthe- 
less, if these methods are defined on special fitted meshes, the convergence to the 
exact solution is uniform in e [3,4]. Shishkin meshes, [6,7], are simple piecewise 
uniform meshes of this kind, frequently used for singularly perturbed problems. 
For the reaction-diffusion problems considered, the corresponding Shishkin mesh, 
/at = {0 = xo, . . . ,xn = 1}, is defined as follows. Let a = min { co 's/slog iV}, 
where ctq is a constant to be chosen later. Dividing [0, 1] into three intervals 
[0, ct], [1 — tr, 1] and [cr, 1 — cr], we take a piecewise uniform mesh with iV/4, iV/4 
and N/2 subintervals respectively. We denote by hj = xj — Xj-i, j = 1, . . . ,iV 
and by = 2(1 — 2a) (N, = A:ajN . In the sequel we will suppose that 

a = an^/elog N (otherwise classical analysis proves the convergence of the meth- 
ods). 

The finite difference schemes that we present, are modifications of the central 
difference scheme, by incorporating some compact difference approximations of 

* This research was supported by the project DGES-PB97-1013 
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some terms of its local truncation error. In [5,8] this procedure was used to 
construct high order schemes for non singularly perturbed problems. 

Henceforth, we denote by c, C any positive constant independent of e and 
the mesh. 



2 A Uniform Convergent Scheme with Order 3 

Central difference scheme is given by 

= —sD'^D Zj + bjZj = fj,l<j<N— 1, Zq = uo, Zj^ = ui, (2) 

where D^D~ Zj is the second order central difference discretization on non uni- 
form meshes. Its local truncation error satisfies 

'tIu = L\^M[u{xj,e) - Zj\ = 



sjhj+i 2g(fej--|-i + hj) rif at-3 



<- 



,,, j , It'- ' + 0 {N ). 

4!(/ij -h hj+i) 

( 3 ) 

Therefore, to construct a third order accurate (possibly non uniform) compact 
scheme, we must to find compact approximations of the first two terms of (3) 
with order 0{N~^). Deriving equation (1), we have 



- = f - b'u - bu', 

and therefore 



-ew(4) = /" _ b"u - 2b' v! - bu", 



- = /" - b"u - 2b' u' + 



( 4 ) 



( 5 ) 



Thus, to obtain third order approximations of both u'" and approximations 
of the first derivative with order 0{N~^) are required. For the first term of (3), 
clearly we must only analyze the transition points a and 1 — cr. We consider 



Uj ~ ^j,N/4 



D Uj + -1- 5. 



t'i,3AT/4 



D+Uj-^u''],j = N/A,m/A, 



where Sji = liij = l, 5ji = 0 if j yf 1. Using the differential equation, we deduce 



D Uj + 



hjjbjUj - fj) 

2e 



d" ^1. 



'j,3N/4 



D^Uj — 



h]+i{bjUj - fj) 
2s 



( 6 ) 

For the second term of (3) , we approximate the first derivative depending of the 
sign of b' . Thus, we take 

u'j « sgn 6' (^D~Uj + + (1 ~ sgn 5') (^D+Uj - , 

where sgn Zj = 1, if Zj > 0 and sgn Zj = 0, if Zj < 0. Using again the differential 
equation, we use the following approximation 



sgn 6' ( D Uj + 



hjjbjUj - fj) 
2e 



-|-(l-sgn b'j) ( D^Uj - 



hp+i{bjUj - fj) 

2e 



( 7 ) 
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Therefore, we have obtained suitable approximations of the first two terms of (3) 
depending on Uj-i, Uj, t/j+i and linear combinations of the data. The incorpo- 
ration of these approximations to central difference scheme, gives the following 
scheme: 



LlN[Uj] = QUfj), l<j<N-l, Uo = uo, Un = ui, 



(8) 



where 



LInPj] = -sD+D-U, + 



hj+i - hj 



^ji^j,N/iD Uj + 5j^3j^/4^D^Uj)+ 






and 






3 U 



6(/i + h Uj -h (1 - sgn b'j)D+Uj) Q%{bj)Uj, 



,n2 ^ I ^j+3 ( J I ( X 

QnKZ 3) = Zj H 5 ( Zj + [d. 






{Oj,N/ihj — 3^/4/lj+i 



0^ 



^// - (1 - sgn b'j)hj+i) 



The local truncation error for this scheme is given by 

2 



+ 

5!(/ij -I- hj+i) 3 hj + hj+i 



-f 



-f 



hj+i hj 
3 

/14+1 -I- hj 



hi 5 






R2{X j,Xj-l,u) 
hi 



-f <5. 



'j,3N/4,- 



Rs{xj,Xj+i,u) Rs{xj,Xj-i,u) 
hj+i hj 

R2(Xj,Xj + l,u) 



h 



u! ( u! R 2 {Xj,Xj-l,u) 

-b.^sgnb, 



-i+i 



12[hj -t- hj^i) 

where i?„ is the remainder of Taylor formula. 



-I- (1 - sgn bj) 



, R 2 {Xj,Xj + l,u) 



h 



■j+i 



(9) 



-f 



( 10 ) 



Proposition 1. ITe suppose that bj > 0 (similar bounds can be obtained for 
bj < 0/ Let dj = {h^j + ^^+i)/((hj -I- hj+i)e). Then, 

' CN-'^al log'* N, for l<j< N/4 and 3iV/4 < j < N, 

C (n~^ + for N/4< j <3N/4, 

C l^N-^cr^elog'^ N + dAT/ 4 iV- , for j = iV/4, 

^ C (iV-3a2£log2 N + + d3N/4N-^^») , for j = 3iV/4. 

Proof. We distinguish several cases depending on the localization of the mesh 
point. For xj G (0,cr) U (1 — ct, 1), from (10) we deduce that 



Ku\ < 



Ku\<c 



^(e{xj,Xj+i,l3,e) + e(xj-i,Xj, /3,e)) 



Using that h^‘^'> = 4N ^aox/elogN and bounding the exponential functions by 
constants, we prove that |r|„| < CA^“^ctq log'* iV, 1 < j < N/4 and 3A^/4 < j < 
N-1. 
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For Xj G (ct, 1 — cr) we distinguish two cases. First, if < e, we easily 
obtain that 



It?, I < C 



/,(!)% /,(i)V2(e(x j,Xj+i,(3,e) + e{xj-i,Xj,!3,e)) 



< iV”"* + 

Secondly, if > £, we can prove that 

|r? I < C [ {xj+i - ^fs-‘^e{^,^,P,e)d^+ 



r^j 

y 

Jxj-i 



-Xj-ife ^e(i,^,P,£)d^ 



Integrating by parts, we deduce 

C^j+i 



[ (xj+i-^fe ‘^e{^,^,P,s)d^ < C ^^‘^fe{xj,Xj,/3,£)+ 

J Xj ^ 

fXj + 1 \ 

+ J s~^'‘^e{^,^,f3,s)dn < C{e{xj,Xj,P,e) + e{xj,xj+i,(3,e)) 

A bound for the other integral follows in the same form. Using that < 2N~^, 
we deduce that |rj„| < C{N~'^ + iV“V^'^“). Finally, we analyze the error for 
the transition point ccAr /4 = cr (similarly for X 37 V /4 = 1 — cr). If /(2e) < 
+ h^‘^^)s) < 1, then 

|t^/4^„| < £~^/'^e{xM/A-,XN/i+i,l3,£)+ 

+/i(^)/i^^)^£:“^/^e(a;Ar/4_i, xtv/4, P, e)) < C{N~^eaQ log^ N + iV“V^'^“). 

On the other hand, if 1 < + h^^^)e) < we have 

+d^/^i £"i/2e(^,^,/3,£)dC+ 

\^Xjv/4 

+ [ e"^^^e(^,^,/3,£)dC I I < C + dN/i{e{xN/4,XN/i+i,P,£)+ 

Jxn/ 4 .-\ ) ) ^ 

+e(cCiv/4-i,a;Ar/4,/3u))) < C'(A^"^crg£log^ + d7v/4^"^'^“)- 



Proposition 2. Let L'l^p^\Uj\ = Tj Uj-i + r^Uj + r^Uj+i. For N sufficiently 
large, there exists a constant c such that 

0 < cmax{l,dj} < r“ + + r+, rj <Q, < 0. 



( 11 ) 
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Proof. Expression (9) can be written in the form of the statement with 



T+ — 



{hj + hj+i)hj 

-2e 

{hj + hj+i)hj+i 



r"j = -Tj - r+ + 



- W o, - sgn 6 , 

3hj 6{hj + hj+i)hj 

I I {hj+i — hj)bj {hj^i + hj)bj 



( 12 ) 

(13) 

(14) 



Using (12) and (13) it follows readily that rj < 0 and rj~ < 0. By (14), 
0 < cmax{l,dj} < r~ + r^ + r^ is equivalent to 0 < cmax{l,dj} < Q'j^{bj). 
Since 

Ohi^j) > bj + > cmax{l, d^}, 

for N sufficiently large, the result holds. 



Corollary 1. The operator (8) is of positive type and therefore it satisfies the 
maximum discrete principle. 



Theorem 1. Let u{x, e) be the solution of (1) and {Uj\ 0 < j < iV} the solution 
of the scheme (8). Then 

\u{xj,e) -Uj\<C log ^ N + N-^a^elog^ N + , 0 < j < N. 

Proof. Defining (f>j = C{N~^aQ log^ iV+ A^^^crpelog^ N+ with C suffi- 
ciently large, using that Tr'^ + r'^ > cmax {1, dj} and the maximum principle, 

the result follows. 



3 A Uniform Convergent Scheme with Order 4 



From Proposition 1, if y^o’o > 4 the scheme (8) is accurate of order 4, except 
for the transition points when 1 > {h^^^ +h^'^'> ) / {e{h^^^ + . Then, we must 

only modify the scheme in this case. To do that, we write the local truncation 
error of the central difference scheme in the form 



— £ 




2(d^+i + h^) (4) (5) 

4\{h, + h,+i) ^ ^5\{h, + h,+i) 



0{N~'^). 



Thus, we must find approximations with order 0{N~‘^) of the terms into brack- 
ets. For the first term, an approximation of the first derivative is needed. We 
consider 



+ ^l,3Af/4 



^ 2e 3! ) 

. _ ^1 + 1 ~ fj) _ yll 



2e 



3! 



( 15 ) 
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Combining (15) with (4), we obtain 

hjjbjUj - fj) h]bjD+Uj+b'^Uj- fj' 



u'j « ^j,N/4 



D Uj + 



2e 



3! 



+<5i 



'j,3N/4 



D+Uj - 



hj+i{bjUj - fj) h^j+i bjD Uj + b'^Uj-ff 



(16) 



2e 



3! 



For the second term, we consider the same approximations that for the scheme 
Lg For the last term, deriving (5) and using (1), we can deduce 

_ ^^(5) _ £(/' " - b"'u - 3b” u') - Abb'u - 9v! + 3b' f + bf 

£ 

Therefore, it is sufficient to find approximations of 6"^' and We 

propose the following ones: 

« b" (5j^^/4(sgn b” D~Uj + (1 — sgn b”)D'^uj) + Sj^ 3 i^/i(sgn b" D^Uj-\- 
+ (1 - sgn b'J)D~Uj))), 



fef u. 






b.iV/4 



D Uj + 



hjjbjUj - fj) 
2e 



+ 5. 



J.3JV/4 



D+ _ bj+i{bjUj fj) 

^ 2s 



Therefore, in the transition points the scheme is Uj-i + rjUj + 

r'jUj+i = Q%{fj), j = iV/4, 3A^/4, where 






4' + 4^4 ^ “sn 



bj+1 hj 



\2ihj + hj+i) 

+ ~^^{bj3j^N/4 — hj+l5j^3ff/4) ^|^(^J.iV/4^j + 5j,3iV/4ftj + l) 



2(hj+i - hj) 
5\{hj + hj+i) 

-2e 



SbjZj bjZj bjZj 



£ e 

— h 



H + -7r£^{^j,N/4hj — hj + l53N/4,j) 



2e2 

5j,N/4bj ^ <5j,3iV/4^|+l&j 



3!e/i, 



hj{hj + hj+i) 
sgn 6) (/if + hj+i)fe) 

6hj{hj + hj+i) 

2(/tj+i - hj) f 3b" sgn b” ^ 3b" {1 - sgn b”)6j^3N/4 , ^j,N/4bj 

5\{hj + hj+i) 



shi 



+ -2e 

r ■ = 

^ bj+i{hj + hj+i) 



+ 



Sj,3N/4bj ^j,N/4h'jb'^ 



y+1 



3\shj+i 



+ 



(1 - sgn 5))(fej + fej+i)fc) 

6hj+i{hj + hj+i) 

2(/tj+i - hj) / 3b"(l - sgn fe")^j,jy/4 36" sgn b" ^j,3 jy/4 , 5j,3N/4b\ 

5!(6j + hj+i) I 6j+i hj+i 



+ 



shj+i 



C = -’"j +Qv(6j). 



+ 

+ 



(18) 



(19) 



( 20 ) 
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Proposition 3. For N sufficiently large, there exists c sueh that 

0 < cmaxjl, dj} < r“ + + r+, rj<0, r^<0. (21) 

Proof. We only must study the cases where the new scheme is different of (8). 
From (18) and (19) it is obvious that rj < 0 and < 0. Using that 






hj+i hj 



+ ^j+i ,// 

h,+i) ^ 



^ 3!e 



{Sj,N/ffi'^ + ^j,3N/ffi'j+i) 



12(h 



2(fej+i - h^) 

5\{hj + hj+i) 



46, 6' 



> c> 0, 



the result follows from (20). 



Corollary 2. The new operator is of positive type and therefore it satisfies the 
maximum discrete principle. 



We finally study the local truncation error in the transition points a and 
1 — (T. We show the case Xj = a and we could proceed similarly for = 1 — cr. 
The local truncation error in this point is given by 



= 



2e 



+ 6,-|_i 



Rs{xj,Xj-i,u) 

6,'+i 



hj+i - hj ^ / R 3 {xj,Xj-i,u) fej Ri{xj,Xj+i,u) 



6j R2{xj,Xj-i,u) ^ 



3 ^ 

+ ^j+i 

12(6,- + hhj+i) ^ 

2(6|+i - 6|) 



3! 



6,+ie 



6' 



sgn y Mxj.Xj+i,u) ^ ^,^.^R2{xj,Xj+i,u) 



5\{hj +hj+i) 



36" 



jffRi(.Xj,Xj—i,u') . ,//. 

sgn 6, + (1 - sgn 6, ) 



6j+i 
Ri(xj,Xj+i,u) 



h 



'j+i 



( 22 ) 



Proposition 4. The local truncation error in the transition points satisfies 



Ku\ < c 



log2 N + djN-'/h'^ 



for j = N/4 or j = 3iV/4. 



Proof. In same way as in Proposition 1, using that h^^'i j ^ is bounded, h^^'> < 
2N~^ and h^‘^'> = 4:N~^^/elogN, it is straightforward (see [1] for details) to 
prove that 



< C 






+ e{xj,Xj+i,P,e) + e{xj-i,Xj,P,e) 



< C{N~^ul log^ N + iV-V^‘"«). 



< 



Using the uniform stability of the operator and the bounds for the local trunca- 
tion error, we obtain the following convergence result. 

Theorem 2. Let u{x, e) be the solution of (1) and {t/,-; 0 < j < iV} the solution 
of the new finite difference scheme. Then, 

\u{xj,e) -Uj\<C log"‘ N + , 



0 < j < fv. 
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4 Numerical Experiments 

To confirm the theoretical result, we consider the problem 

— £m" + (1 + + cosx)m = + sinx, 0 < x < 1, m(0) = 1, m(1) = 1, 

whose exact solution is not known. Pointwise errors are estimated by = 
II C//^ — C/|j^||oo where is the approximate solution on the mesh X 2 i = Xi S 
/at, i = 0, 1 . . . , X 2 i+i = {xi + Xi+i)/2, i = 0,l 1. We denote = 

maxee®’^. The numerical e-uniform rate of convergence, calculated using the 
double-mesh principle (see [2]), is given by p = log(£'£_7v/-E'£.2Ar)/log2. In Table 1 
we give the results obtained with the scheme (8) taking (Tq = 4, which are agree 
with the order given by Theorems 1 and 2. 



Table 1. Pointwise errors and numerical order of convergence 



£ 


N=16 


N=32 


11 


N=128 


N=256 


N=512 


N=1024 


2'' 


1.166E-7 

4.000 


7.285E-9 

4.000 


4.554E-10 

4.000 


2.845E-11 

4.000 


1.778E-12 

4.000 


l.lllE-13 

4.000 


6.944E-15 


2-b 


1.106E-4 

3.844 


7.703E-6 

3.982 


4.874E-7 

3.995 


3.056E-8 

3.998 


1.912E-9 

4.000 


1.195E-10 

4.000 


7.469E-12 




6.211E-3 

0.368 


4.813E-3 

1.837 


1.347E-3 

3.678 


1.052E-4 

3.842 


7.333E-6 

3.979 


4.650E-7 

3.993 


2.921E-8 




6.174E-3 

0.363 


4.801E-3 

1.666 


1.513E-3 

2.722 


2.293E-4 

3.106 


2.663E-5 

3.291 


2.721E-6 

3.377 


2.620E-7 




6.174E-3 

0.363 


4.801E-3 

1.666 


1.513E-3 

2.722 


2.293E-4 

3.106 


2.663E-5 

3.291 


2.721E-6 

3.377 


2.620E-7 


Ee,N 


6.211E-3 

0.368 


4.813E-3 

1.669 


1.513E-3 

2.722 


2.293E-4 

3.106 


2.663E-5 

3.291 


2.721E-6 

3.377 


2.620E-7 
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Abstract. In this work a grid free Monte Carlo algorithm for solving 
elliptic boundary value problems is investigated. The proposed Monte 
Carlo approach leads to a random process called a ball process. 

In order to generate random variables with the desired distribution, re- 
jection techniques on two levels are used. 

Varied numerical tests on a Sun Ultra Enterprise 4000 with 14 Ultra- 
SPARC processors were performed. The code which implemented the 
new algorithm was written in JAVA. 

The numerical results show that the derived theoretical estimates can be 
used to predict the behavior of a wide class of elliptic boundary value 
problems. 



1 Introduction 

Consider the following three-dimensional elliptic boundary value problem: 

Mu = —cf}{x), X e 17, 17 C IR^ and u = x G 917, (1) 

where the differential operator M is equal to 




We assume that the regularity conditions for the closed domain 17 and the given 
functions b(x), c(x) < 0, </>(x) and 'ii^{x) are satisfied. These conditions guar- 

antee the existence and uniqueness of the solution u{x) in C^(I7) n C(I7) of 
problem (1), (see [1,5]), as well as the possibility of its local integral representa- 
tion (when divh{x) = ~ making use of the Green’s function 

approach for standard domains lying inside the domain 17 (for example - a ball 
or an ellipsoid). 

* Supported by ONR Grant N00014-96-1-1-1057 and by the National Science Fund of 
Bulgaria under Grant ^ I 811/1998. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 359—367, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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Denote by B{x) the ball: B{x) = Br{x) = {y '■ r =\ y — x \< i?(a:)}, where 
R{x) is the radius of the ball. Levy’s function for the problem (1) is 

Lp{y,x) = y.p{R) f {1/r - l/p)p{p)dp, r < R, (2) 

J r 

where the following notations are used: p{p) is a density function; 
r=\x-y\= [ - yif \ , Pp{R) = [47rgp(i?)]"^ , qp{R) = J p{p)dp. 



^i=l 



It is readily seen that Levy’s function Lp{y,x), and the parameters qp{R) and 
Pp{R) depend on the choice of the density function p{p). In fact, the Eq.(2) 
defines a family of functions. 

For the Levy’s function the following representation holds (see [4]): 



l{x)= / {u{y)M*Lp{y,x) + Lp{y,x)(l){y)) 
J B(x) 



dy 



/ 

ldB{x) 



B(x) 

Lp{y,x)du{y) u{y)dLp{y,x) 



- h{y)u{y)Lp{y,x) 



dyi dyi 

where n = (ni,ri 2 ,n 3 ) is the exterior normal to the boundary dB{x) and 



( 3 ) 



dyS , 






z{x). 



is the adjoint operator to M. 

It is proved (see [3]) that the conditions M*Lp{y, x) > 0 (for any y e B{x ) ) 
and Lp{y, x) = dLp{y, x)/dyi = 0, * = 1, 2, 3 (for any y S dB{x ) ) are satisfied 
for p{r) = where 

k > b* + Rc*, b* = max I b(a:) I, c* = max I c(x) I 

and R is the radius of the maximal ball B{x) C SI. 

This statement shows that it is possible to construct the Levy’s function 
choosing the density p{p) such that kernel M*Lp{y, x) is non-negative in B{x) 
and such that Lp(y,x) and its derivatives vanish on dB(x). 

It follows that the representation (3) can be written in the form: 



where 



u{x)=j M*Lp{y,x)u{y)dy+ Lp{y,x)4>{y)dy, 
J B{x) J B{x) 

M*Lp{y,x) = pp{R)^^ - pp{R)c{y) J ^^dp 



( 4 ) 



Pp{R) 






Vi - Xi 



2 = 1 



p{p)dp. 



The representation of u{x) in (4) is the basis for the proposed Monte Carlo 
method. Using it, a biased estimator for the solution can be obtained. 
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2 Monte Carlo Method 



The Monte Carlo procedure for solving Eq.(4) can be defined as a ’’ball process” 
or ’’walk on small spheres”. Consider a transition density function 

p{x,y)>Q and / p{x,y)dy = l. (5) 

Jb{x) 

and define a Markov chain ^q, , such that every point ^j,j = 1,2,..., is 

chosen in the maximal ball lying in f2 in accordance with the density (5). 

Cenerally, the ’’walk on small spheres” process can be written as following 
(see [8]): 

0 = ^1-1 + J = l,2,..., ae(0, 1], 

where are independent unit isotropic vectors in IR^. In particular, when a = 1 
the process is called ’’walk on spheres” (see [6,8]). 

To ensure the convergence of the process under consideration we introduce 
the e-strip of the boundary, i.e. 



dHg = {y G n :3x £ dfl for which | y — x |< e}. 

Thus the Markov chain terminates when it reaches and the final point is 
€ dfig. 

Consider the biased estimate for the solution of Eq.(4) at the point 
(see [2]): 

b-i r 

OiMo) = E 

J=0 

where M*Lp(6,^,_i) 

Wo = l, Wj = Wj-i ^ ^ ’ 

equality holds (see [6]); 






(6) 



' J — l ( f i- \ 1 j — li2,...,Ze. 

If the first derivatives of the solution are bounded in C then the following in- 



\EOi^i^o)-u{(o)\^<c,e\ (7) 

Using N independent samples we construct a random estimate of the form 



_ 1 ^ 



2=0 



The root mean square deviation is defined by the relation 

E{0iA^o) - u{^o)? = Var{0i^iio)) + {u{^o) ~ E0i^(^o)f. 



Hence 

E(0iA^o)-u{^o)r = < ^+cis^ = ( 8 ) 

where y is the desired error, dp is upper boundary of the variance and ci is the 
constant from Eq. (7). 
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3 A Grid Free Monte Carlo Algorithm 



Using spherical coordinates [2] we can express the kernel k{x, y) 
as follows: 



kir wl = 

^ ^ 9p(i?)47r 

, SLi + r'w)wi + c{x + rw)r , cix + 

1 + / p{p)dp - 



p{r) 



p{r) 



= M*Lp{y,x) 




Here w= {wi , 1 x 2 , 103 ) is an unit isotropic vector in where wi = sin 9 cos (f, 
IV 2 = sin 0 sin and 1 x 3 = cos 9 {9 G [0, tt) and (p G [0, 27t)). 

Let us consider the following two non-negative functions 



Po(r,w) 



p{r) sin 9 ^ ^ + rw)wi 

qp(R)4TT p(r) 




p(p)dp. 



when c(x -I- rw) = 0 and p(r,w) = k(r,w), when c(x + rw) < 0. 
The following inequalities hold: 



p(r,w) < po(r,w) < 



p(r) sin 9 
qp(R)4TT 



b* 



(9) 



We note that function po{r,w) satisfies the condition (5) ( see [2]). 
Denote by p{r, w) the following function: 



p{r, w) = ^ where f f f p{r,w)drd9(p = V <1. (10) 

r Jo Jo Jo 



Introduce the functions: 

, , , 6i(x -I- rw)wi -I- c(x -I- rw)r 

p(w/r) = l+=^^ ' 



p{r) 

Po{w/r) = 1 -t 



c(x-|-rw)r2 f^p{p). 



Y)t=ibi{x + rvf)i 
p{r) 



p(p)dp. 



Using inequalities (9) we obtain: 

b* 

p(w/r) < Po{w/r) < 1 -k J p{p)dp. 



( 11 ) 



Now we can describe the grid free algorithm for simulating the Markov chain 
with transition density function (10). The Markov chain is started at the fixed 
point ^Q. The inequalities in (11) are used to sample the next point by applying 
a two level acceptance-rejection sampling (ARS) rule. 

The ARS rule or the Neumann rule can be used if another density func- 
tion V 2 {x) exists such that C 2 V 2 {x) is everywhere a maximum of the density 
function ui(x), that is, 02 ^ 2 ( 2 :) > ui(x) for all values x (see for details [2]). The 
efficiency of this rule depends upon C 2 V 2 (x) and how closely it envelopes ui(x). 
A two level ARS rule is preferable when ui(x) is a complex function. In this 
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case a second majorant function must be found which envelopes very closely our 
density function. 

Algorithm 3.1 

1. Compute the radius i?(Co) of the maximal ball lying inside fl and having 
center 

2. Generate a random value r of the random variable r with the density 

p{r) ke-^'^ 

qp{R) 

3. Calculate the function 

Hr) = 1 + = 1 + j(l - 



4. Generate the independent random values w of a unit isotropic vector in 

5. Generate the independent random value 7 of an uniformly distributed ran- 
dom variable in the interval [0, 1]. 

6. Go to the step 8 if the inequality holds: 7 /i(r) < pQ(w/r)}. 

7. Go to the step 4 otherwise. 

8. Generate the independent random value 7 of a uniformly distributed ran- 
dom variable in the interval [0, 1]. 

9. Go to the step 11 if the inequality holds: 7Po(w/r) < p{w/r). 

10. Go to the step 4 otherwise. 

11. Compute the random point ^1, with a density p{w/r) using the following 

formula: = ^0 + 

The value r =| — ^0 I is the radius of the sphere lying inside 12 and having 

center at 

12. Repeate Algorithm 3.1 for new point if 

13. Stop Algorithm 3.1 if € dilg. 

The random variable is calculated using formula (6). 

The computational cost of the algorithm under consideration is measured by 
quantity 



S = Nto El,, 



where N is the number of the trajectories performed; El, is the average number 
of balls on a single trajectory; tg is the time of modeling a point into the maximal 
ball lying inside 12 and of computing the weight W which corresponds to this 
point. 

We note that for a wide class of boundaries 12, (see [8,6]), the following 
estimate has been obtained on the basis of the restoration theory, El, = C2I Inej. 
If the radius r = rp is fixed and tq/R = a G (0, 1] then the following estimate 
holds (see [8]): 



4i?^| In el 



+ 0{rt), 



El, 
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where R is the radius of the maximal ball lying inside 17. 

It is clear that the algorithmic efficiency of the Algorithm 3.1 depends on 
the position of the points in the Markov chain. They must be located ’’not far 
from the boundary of the ball”. Thus, the location of every point depends on 
the random variable r with a density Eq.(12). 

The following assertion holds (see [2]): 

Lemma 1. Let ag € (0, 0.5). Then Et € {aoR, 0.5R), if and only if the ra- 
dius R of the maximal ball and the parameters b* and c* satisfy the inequality 



R{b* + Rc*) < (do, (13) 

where /3q is the solution of the equation gi{z) = ^ + = o-o- 

Therefore, after substitution r^ = aoR, where ao is the parameter from 
Lemma 1, the average number of balls get 

Elsi<^\lne\. (14) 

In order to obtain the error of order p, (see Eq. (8)), the optimal order of the 
quantities N and e must be 



iV=0(/r-2), e = 0(p), 

Note that this estimate of computational cost is optimal as to the order of 
magnitude of p only. It does not take into account the values of the constants 
in (8). 

In order to minimize the computational cost we should solve the conditional 
minimum problem (see [6]): 



S = Nto Ek 



mm, 

N,e 



O-O 2 2 

^+C1S =p 



= p^ or S' = 



do to 

p^ — c\e^ 



rEL 



2 ^ 2 
Ci£ < p . 



Having solved this problem we obtain the optimal values of the quanti- 
ties N, S and er: 



N* = 

2ci£ 2| ln£* 

where £* is a solution of the equation 



2do to 
ciag ei 



Ci£^ -I- 2 ci£^| ln£| = p^ . 



(15) 



It is not difficult to estimate the variance VariOi^ (Co)) when the function = 
0. In this case we have 0z^(Co) = Thus 

Var{0iMo))<E{el{fo)) = 



(M*£p(ei,Co))^ 

B(Co) P(Co,Cl) 



(M*£p(e4,C4-i))^ 
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Denote by 



V = max 



/ M*Lp(y,x)dy and •0* = max / l-i/j(y)l^dy 

Jb{x) 



Now we obtain < y 2 i ^^2 < ^ 2 _ 

Thus in this case, the optimal values of the quantities N ^ S get: 

i’l o* _ 



N* = 



2ciel\ Ine* 



S* 



2 9 ■ 

ciag ei 



where the constant ci depends on the condition the first derivatives of the solu- 
tion shall be bounded in 17 and oq depends on Eq.( 13). 



4 Numerical Result 

As an example the following boundary value problem was solved in the cube 
17= [0,1]3; 

(If 

u{xi,X2,X3) = e“i(“i+^2-l-X3) ^ (^X1,X2,X3) G df2s. 

In our tests 6i(x) = a 2 Xi{x 2 — X 3 ), b 2 {x) = a 2 X 2 {x 3 — xi), & 3 (x) = a2X3{xi — X 2 ), 
and c(x) = — Saf, where oi and 02 are parameters. 

We note that the condition divb(a:) = 0 is satisfied. 

The code which implemented the algorithm under consideration was written 
in JAVA. The multiplicative linear-congruential generator, which was used to 
obtain a sequence of random numbers distributed uniformly between 0 and 1, is 
Xn = 7®x„_imod(2^^ ~ !)• K was highly recommended by Park and Miller [7] 
and they called it the ’’minimal standard”. 

Numerical tests on a Sun Ultra Enterprise 4000 with 14 UltraSPARC processors 
were performed for different values of the parameters oi and 02 (see Tables 1,2). 
The solution was estimated at the point with coordinates x = (0.5, 0.5, 0.5). 
In the tables, u{x) is the exact solution, ui^ (x) is the estimate of the solution, 
/ig is the estimate of the corresponding mean square error, cr^ is the estimate 
Var{0i^{x)). The results presented in Table 1 are in good agreement with theo- 
retical one (see Eq’s. 14,15). Moreover, the results presented in Table 2 show how 
important it is to have a good balancing between the stochastic and systematic 
error. When N* = 50533, the time of estimating solution is: ti = 51ml4.50s and 
when N = 10® the time is: ^2 = 19ft.17m44.25s. Thus, the computational effort 
in the first case is about twenty times better than second one, while Monte Carlo 
solutions are approximately equal. On the other hand the numerical tests show 
that the variance does not depend on the vector- function b{x). 
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Table 1. u{x) = 1.4549915 , ai = 0.25, c* = 0.1875, R^ax = 0.5 



b* = 02^3 


£* = 0.01, A* 


= 3032 




£* = 0.001, A* = 


= 202130 




Ui^{x) fj,e 




Ele 


Ule(x) fJ-s 




Ele 


8^3 


1.465218 ± 0.029 


0.0437 63.73 


1.456395 ± 0.0035 


0.04455 


95.75 


4x/3 


1.460257 ± 0.029 


0.0427 43.62 


1.456704 ± 0.0035 


0.04448 


74.92 




1.465602 ± 0.029 


0.0434 38.85 


1.457545 ± 0.0035 


0.04466 


69.46 


V3 


1.456592 ± 0.029 


0.0423 36.96 


1.456211 ± 0.0035 


0.04462 


67.95 


V3/4 


1.461289 ± 0.029 


0.0432 36.46 


1.456149 ± 0.0035 


0.04455 


67.32 


v/3/16 


1.456079 ± 0.029 


0.0428 36.72 


1.455545 ± 0.0035 


0.04450 


67.12 


Table 2. u(x) = 2.117 , oi 


= 0.5, 


C* = 0.75, Rmax = 


= 0.5 






e* = 0.001, N‘ 


■ = 50533 


£ = 0.001, A = 


: 1000000 


b* = 02^3 


Ul^{x) fJ.s 




Ele 


Ul^{x) fie 




Ele 


8^3 


2.12829 ± 0.015 


0.374 


97.41 


2.12749 ± 0.015 


0.3770 


97.00 


iTs 


2.13151 ± 0.015 


0.3782 


75.93 


2.12972 ± 0.015 


0.3787 


75.68 


2^3 


2.12878 ± 0.015 


0.3775 


70.17 


2.12832 ± 0.015 


0.3790 


70.05 


V3 


2.12898 ± 0.015 


0.3781 


68.39 


2.12547 ± 0.015 


0.3774 


67.95 


^/3/4 


2.12227 ± 0.015 


0.3771 


67.71 


2.12125 ± 0.015 


0.3760 


67.63 


v/3/16 


2.12020 ± 0.015 


0.3753 


67.61 


2.11869 ± 0.015 


0.3750 


67.53 



5 Conclusion 

In this work it is shown that a grid free Monte Carlo algorithm under considera- 
tion can be successfully applied for solving elliptic boundary value problems. An 
estimate for minimization of computational cost is obtained. The balancing of 
errors (both systematic and stochastic) either reduces the computational com- 
plexity when the desired error is fixed or increases the accuracy of the solution 
when the desired computational complexity is fixed. 

The studied algorithm is easily programmable and parallelizable and can be 
efficiently implemented on MIMD-machines. 
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Conditions 



Jose M. Gutierrez and Miguel A. Hernandez* 

Universidad de La Rioja, Departamento de Matematicas y Computacion, 
26004 Logrono, SPAIN 



Abstract. The classical Kantorovich theorem for Newton’s method as- 
sumes that the derivative of the involved operator satisfies a Lipschitz 
condition 

||A'(a:o)"^ [F'{x) - F'{y)] || < L\\x - y\\ 

In this communication, we analyse the different modifications of this 
condition, with a special emphasis in the center-Lipschitz condition: 

||A'(a;o)'^ [F'{x) - A'(zo)] || < u}{\\x - soil) 

being u> a positive increasing real function and xo the starting point for 
Newton’s iteration. 



In this paper we make a survey of the convergence of Newton’s method in 
Banach spaces. So, let X, Y be two Banach spaces and let F : X ^ F be a 
Frechet differentiable operator. Starting from xq S X, the well-known Newton’s 
method is defined by the iterates 

^n+l — X (Xfi) F(Xti), Tl — 0, 1, 2, . . . (1) 

provided that the inverse of the linear operator F'(xn) is defined at each step. 

Under different conditions on the operator F, the starting point xq or even 
on the solution, it is shown that the sequence (1) converges to a solution x* of 
the equation F(x) = 0. 

In broad outline, three types of convergence results can be given: 

— Local convergence: The existence of solution is assumed. In addition, it is 
also required the invetibility of F'(x*) and a Lipschitz-type condition: 

\\F'{xn-^[F\x)-F'{y)]\\<P\\x-y\\, x,y G C X. (2) 

— Semilocal convergence: Conditions on the starting point Xq instead of the 
solution X* are assumed. In this way, two types of semilocal results can be 
distinguish: 

* Research of both authors has been supported by a grant of the Universidad de La 
Rioja (ref. API-99/B14) and two grants of the DGES (refs. PB98-0198 and PB96- 
0120-C03-02). 
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• Kantorovich conditions: There exists the inverse of F'{xq) and the fol- 
lowing conditions are fulfilled: 



||F'(a::o) ^F(j:o)|| < a, 



\\F\xo)-^[F'{x)-F'{y)]\\<b\\x-yl x,y G n, C X, (3) 



ab < 1/2. 

• Smale’s a-theory: The Lipschitz condition on a domain (3) is replaced 
by a punctual condition on F and its derivatives. Let 



||F'(xo) ^i^(a;o)|| < a, 



sup 

k>2 






i/(fc-i) 

< 7- 



Then ay < 3 — 2>/2 is a sufficient condition for the convergence of New- 
ton’s method. 

— Global convergence: Monotone convergence of (1) is established, in general, 
under convexity conditions on the operator F. 

In this paper we analyze some modifications of the Kantorovich-type con- 
ditions, mainly modifications of (3). First at all, let us say that Kantorovich 
conditions guarantee the existence and uniqueness of the solution in given balls 
around xq and the quadratic convergence of (1). This kind of results can be 
proved by finding a majorizing sequence {tn}, that is, a real sequence satisfying: 

||:^n+l ^n\\ ^ ^n-t-1 U, ^ 0. 

The sequence {f„} is shown to be Newton’s method applied to the equation 

p{t) = 0, where p{t) = — t + a. (4) 

For more information about these topics, consult the basic reference text [5]. 

The first modification we comment here is due to Wang Zhenda. In his pa- 
per [4], he considers Newton’s method under the following Lipschitz condition 
on the second derivative: 



\\F'ixo)-^[F"{x)-F"{y)]\\<b,\\x-y\l x,yGQ 2 <^X. (5) 

Then, supposing that ||F''(a:o)“^F’(a:o)|| < a, ||F’'(xo)“^F'"(a:o)|| < c the con- 
vergence of (1) follows from the convergence of the Newton sequence applied to 
a cubic polynomial. 

Almost at the same time, we have proved in [2] the convergence of (1) under 
the weaker condition: 



||F'(xo)-i [F"{x) - F'\xo)] II < b 2 \\x - xoll, X € 123 C A. 



( 6 ) 
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On the one hand, this condition is more restrictive than (3) because it concerns 
the second derivative instead of the first one. But on the other hand, the Lipschitz 
condition is weakened, because one of the points is fixed. This kind of conditions 
are known as center Lipschitz conditions. In addition, notice that (6) is weaker 
than (5). In fact, (5) implies (6). 

The technique followed in [2] is different from the one of [4] . However in both 
cases, the convergence holds from the study of a third order polynomial and the 
existence of a positive root for the corresponding polynomial is assumed. 

Let us concrete now, by stating both results: the famous Kantorovich theorem 
and the theorem given in [2] . 

Theorem 1 ((Kantorovich)). Let F be a differentiable operator defined in an 
open ball L2 = B{xq,R) = {x € X; ||x — xqH < i?}. Let us assume that Fq, the 
inverse of F'{xq) is defined and 

||^o^’(a:o)|| < a, ||^o[-F''(a:) - ^’'(y)]|| < - ?/||, x,y G L2. 

Then if ab <1/2 and t* < R, Newton’s method (1) converges to x* , solution of 
the equation F(x) = 0. Ln addition, the solution is located in B(xQ,t*) and is 
unique in B{xq,po), where po = min{t**,i?}. Here, we have denoted 

1 - 1 + 

& ’ b ' 

Notice that t* and t** are the roots of the polynomial p{t) defined in (4). 
The condition ab < 1/2 guarantees the existence of such roots. 

Theorem 2 (([2])). Let F be a twice differentiable operator defined in an open 
ball Q = B(xo,R). Let us assume that Fq, the inverse of F'{xq) is defined and 

||F'(a:o)-'F(xo)|| < a, ||i^'(a:o)-'F"(xo)|| < c, 

||F'(xo)-i [F"(x) - F"(xo)] II < b 2 \\x - xoll, x G [2. 

Then, if the polynomial 



q{t) = - t + a, (7) 

has two positive roots ri, r 2 (r\ < r 2 ) and r\ < R, then Newton’s method (1) 
converges to x* , solution of the equation F(x) = 0. Ln addition, the solution is 
located in B{xo,ri) and is unique in B{xq,Pi), where p\ = min{r 2 ,i?}. 

The following condition 

6ac^ + 9a^&2 + 18ac62 < 3c^ + 8&2, 

is equivalent to the existence of roots for the polynomial q{t) defined in (7). 
Theorems 1 and 2 are not comparable, as we show in the following examples. 
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Example 1. Let fl = X = = IR, ccq = 0 and f : X ^ Y the polynomial 

r. N 1 3 1 2 5 1 

/M = -x +-X + 

In this case, a = 2/5 and b = 8/5. Then ab = 16/25 >1/2 and Kantorovich 
condition fails. However, with the above notation, we have a = 2/5, c = 2/5 
and 62 = 6/5. Then, 



6ac'" + 90^62 + 180062 = 5.6832 < + 862 = 10.08. 



So, the corresponding polynomial (7) has two positive roots and Theorem 2 can 
be applied. 



Example 2. Let Q = X = Y = IR, xq = 0 and f : X ^ Y the function 



f{x) = sinx — 5x — 8. 



Now, a = 2, c = 0 and 6 = 62 = 1/4. Then ab = 1/2 and the hypothesis of the 
Kantorovich theorem holds. However, in this case, the polynomial (4) is 

Q{t) = + 

which has not positive roots and we cannot use Theorem 2. 



Sometimes, the convergence of (1) can be established by using Theorems 1 or 
2 indistinctly. Then we wonder which result gives us more accurate information 
on the solutions of F(x) = 0. 

Let us consider the polynomials p and q defined in (4) and (7) respectively. 
We have denoted t*, t** the roots of p and ri, C2 the roots of q. Then 



Qin = 



*\2 






e-{b-c)], q{t**) = 






-t** -{b-c) 



Observe that 



p{t*) < 0 62 ^1 — Vl — 2a6^ < 36(6 — c), 

p{t**) < 0 62 ^1 + Vl — 2a6^ < 36(6 — c). 

Our goal now is to get the smallest region where the solution is located and 
the biggest one where this solution is unique. We distinguish three situations: 



1. 62 (1 + Vl — 2 a 6 ) < 36(6 — c). Then ri < t*, t** < T2 and, consequently, the 
solution X* is located in i?(xo,ri) and is unique in H(xo,r2). 

2 . 62 (1 — Vl — 2ab) < 3b{b — c) < 62 (l + Vl — 2 a 6 ) . In this situation ri < t*, 
T2 < t** , then the solution x* belongs to H(xo,ri) and is the only one in 
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3. 36(6 — c) < 62 (l — Vl — 2ab) . Now we have t* < ri, T 2 < t**, thus x* is 
located in B{xo,t*) and is unique in B{xo,t**). 

In cases 1 and 3 we get the best information from the Theorems 2 and 1 
respectively. But in the second case, the best information is obtained by mixing 
both results. 

The next modification we comment here consists in considering a center 
Lipschitz condition for the first derivative, that is, 

||To[F'(a;) - F'(a;o)]|| < 63||a: - xoll, x G (8) 

First, we notice that this condition is weaker than the classical Lipschitz con- 
dition (3). Obviously, (3) implies (8) but the reciprocal is not true. So, there 
are functions satisfying (8) but not (3). For instance, f{x) = ^ defined in 
17 = [0, 00) is not a Lipschitz function in 17. Nevertheless, if we take xq = 1, we 
obtain 



\f{x) - f{xo)\ = \Vx- 1| = ^ ^ < |x- Xol, Vx G 17. 

1 + a/x 

Newton’s method under condition (8) has been studied in [3], where the 
following result can be found: 

Theorem 3 (([3])). Let F he a differentiable operator defined in an open ball 
17 = B{xq,R). Let us assume that Jq, the inverse of F'{xq) is defined and 

||7bF’(xo)|| < a, ||ro[F’'(x) - F'(xo)]|| < L||x - xoll, x G 17. 

Then, if aL < (14 — 4-\/6)/25 = 0.1680816 . . . and 61 < R, we have that New- 
ton’s method (1) eonverges to a solution x* of the equation F(x) = 0. Ln ad- 
dition, the solution is heated in B{xq,Si) and is unique in B(xo,P 2 )> where 
P 2 = min{62,7?}. Here, we have denoted 

^ 2 + 5aL- J25{aLff - 28aL + 4 „ 2 , 

Ji = = 

To prove this result, the authors follow a technique based on the use of 
recurrence relations instead of the classical majorizing sequences. The idea of 
the proof has also been used by Rokne in his classical paper [6] . In his general 
Theorem 1, Rokne assumes a center Lipschitz condition together with a Lipschitz 
condition. Consequently, the hypothesis of Rokne’s result are more restrictive 
than the ones in the previous theorem. 

Finally, the last modification we consider here is a generalization of (8): 

||ro[F’'(x) - F’'(xo)]|| < w(||x - Xoll), VxGR(xo,R), (9) 

where w is a real function such that u>'{t) > 0 for 1 G [0, R] and w(0) = 0. 



Newton’s Method under Different Lipschitz Conditions 



373 



As particular cases in (9) we have the center Lipschitz case {u>{t) = Lt), the 
center Holder case (w(t) = LfP, 0 < p < 1), combinations of both of them, etc. 

To study of Newton’s method under this condition we must previously define 
the non-linear second-order recurrence relations 



ro = 0; ri = a; rk+i = Vk 



1 

1 - w(r-fc) 



pru~i+rk 

J2rk^l 



ui{s) ds, 



( 10 ) 



where (I’(s) = sup { o ;( m ) -|-a;(ti); u + v = s}. This function w has been introduced 
in [1] for the study of Newton’s method. 

Theorem 4. Let F be an operator defined in the open ball fi = B(xq,R)- Let 
us assume that F is differentiable in Q and /q = F'{xq)~^ is defined, with 
||/oA(a:o)|| < a. Let us suppose that eondition (9) holds. Lf the sequence {r^} 
defined in (10) is increasing, with a limit r* < R such that uj(r*) < 1, then 
Newton’s iterates {xk\ are well defined and 



\\Xk+l - Xk\\ < Tfc+l - Tfc . 



( 11 ) 



Consequently, {xk\ converges to a limit x* , that is a solution of F{x) = 0. This 
solution is located in B(xo,r*). Ln addition, if oj{x) > 1 for some x > r* , the 
solution is unique in B{xq,t) where r is the only solution of the equation 

— ^ / uj{s)ds = l, x>r*. (12) 

x-r* 

Proof. We proceed inductively. First, (11) is clear for k = 0: 



Iki - a;o|| = ||roF’(a;o)|| < a = ri - tq. 



Now, let us assume that ||xj+i — Xj\\ < —rj, for j = 0, 1, . . . , fc — 1. Then 



\\xk+i-Xk\\ < || rfcF '( xo )|||| T ’ oF ’( a ; fe )||. 



As \\xk - a:o|| < \\xk - a:fc_i|| H h ||a:i - xo|| < rt < r* < R, 

||/- FoF'{xk)\\ = ||T’o[F’'(xfc) - F'(xo)]|| < io{\\xk - xo||) < w(rfc) < u{r*) < 1. 



Then, there exists FkF'{xo) and ||/feF'(xo)|| < 1/(1 — uj{rk)). 

Next, by (1), we have the following expression for F{xk). 

F{xk) = F{xk) - F{xk-i) - F'{xk-i){xk - Xk-i) 

= / [F'{xk-i +t{xk - Xk-i)) - F'{xk-i)]{xk - Xk-i)dt. 
Jo 



So, we have 

llWa^fc)!! < 



f Fo[Ffxk-i + t{xk - Xk-i)) - F'{xo)]{xk - Xk-i) dt 
Jo 
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f ro[F'{xk-i) - F^(xo)](xfc - Xk-i) dt 
Jo 

< / [to{rk-i + t{rk - Tk-i)) + Lu{rk-i)] {vk - Vk-i) dt 

Jo 

rl nrk+rk-i 

< / u){2rk-i + t{rk - rk-i)){rk - Tk-i) dt = / ui{s)ds. 

Jo J‘irk-1 



Consequently 

||xfc+i - Xfcll < 



rrk+rk-1 



Cj{s) ds = rk+1 - Tk. 



l-w(r-fc) 72r-._i 

As {rfc} converges, {xfc} is also a convergent sequence. In addition, if limxfe = 
then 



^rfc+rfc_i 

lim ||roF(xfc)|| = ||roF(x*)|| < lim / J;(s) ds = 0, 

fe^oo fe^ooJ2rfc_i 

and hence, F{x*) = 0. 

Finally, from (11) we have ||xfe+m — Xk\\ < Tk+m — ffe, Vm > 0 and then 
||a;* — Xfcll < r* — Vk- In particular, ||x* — xo|| < r*. 

To show the unicity, notice that under the hypothesis of the theorem, equa- 
tion (12) has only one solution: t. So, let us suppose that y* G S(xn, r) is another 
solution of F(x) = 0. Then 



0 = To[F(x*) - F{y*)] = A(x* - y*), A = /' Fo[F' {y* + t{x* - y*))] dt. 

Jo 

As the linear operator A is invertible because ||/— A||<1, y*=x*. □ 

As a particular case, let us see the behaviour of the sequences defined in (10) 
when uj{t) = Lt, with L a positive constant, that is, the center Lipschitz case. 
So, we can compare this result with Theorem 3. For uj{t) = Lt we have 

w(s) = sup{o;(m) -I- oj{v)] u + v = s} = sup{Ltt -I- Lv] u -I- w = s} = Ls. 



Then (10) is now defined by xq = 0, ri = a, 



Xk+l = Tk 



r^k+rk-l 



1 - Lxk 



Ls ds = Tk + 



2rfc_ 



L{rk -I- 3r/,_i) 
2(1 - Lxk) 



(Xfc - Xk-l). 



Let us write tk = Lxk, for k > 0. Then, the previous sequence can be ex- 
pressed in the following way: 



tk+i — tk 
to = 0, 



tk F 3tfc_i 

^ 2{1 -tk) 
t\ = aL = h. 



{tk 



tk—l) ; ^ ^ 1 7 
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To study analytically the convergence of the sequence {tk}, let us assume that 
tk <T for k > 0, where T is a bound that we have to settle. So, if T < 1/3, we 
have 



tk + ^ 

2(1 - tk) - 



2T 



1-T 



= M <1. 



We calculate t 2 = (2/i — h^)/(2(l — h)). Then we can bound tk in terms of t 2 - 



M 

tk <t2 + {t2 - h) ^ , fc > 3. 

The question is now, when is true that t 2 + (^2 — h)M/{l — M) < T7 This is 
a equation in T which has a solution if h < 0.187472 . . . The solution is then 



^ 1 + 2h + ^2 — ^(1 + 2h + ^2)^ — 12^2 

“ 6 ■ 

Consequently, we have proved that \ih < 0.187472 . . ., then the sequence {tk} is 
increasing and satisfies tk < (^l + 2h + t 2 — y^(r+'27r+”t2p'^^T2i2^ /6, for all 
fc > 0. Consequently, {tk} is convergent. 

Notice that the value of h obtained above improves the value given in Theo- 
rem 3. Besides, this technique shows that the value oi h = aL can be improved 
by working with ^3, ^4, etc. In this way, numerical experiments show that this 
bound for the product aL can be improved until a value close to 0.213854. 



Notes and Comments. In this paper we have analysed the convergence of New- 
ton’s method by modifying the Lipschitz condition that appears in the classical 
Kantorovich conditions. We have also studied which is the influence of these 
changes in the domains of existence and uniqueness of solution. All the results 
considered here are semilocal, that is, all of them include only a condition on 
the starting point for Newton’s method. 

It would be interesting to analyse the influence of similar changes in the local 
study. For instance, what happens if condition (2) is changed by a condition on 
the second derivative or by a center Lipschitz condition? One interesting refer- 
ence for finding some answers to this question is the paper of Wang Xinghua [7], 
where local results are given under different Lipschitz conditions. 
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Abstract. The general nonlinear matrix equation X + A*X~'‘A = I is 
discussed (n is a positive integer). Some necessary and sufficient con- 
ditions for existence a solution are given. Two methods for iterative 
computing a positive definite solution are investigated. Numerical ex- 
periments to illustrate the performance of the methods are reported. 



1 Introduction 

We consider the nonlinear matrix equation 

X + A*X-^^A = I (I) 

where A is a unknown matrix, / is the identity matrix and n is a positive integer. 

The equation X + A*X~^A = Q has many applications (see bibliogra- 
phy [1,2, 8, 9]). There are necessary and sufficient conditions for the existence 
of a positive definite solution [1,2]. Effective iterative procedures for solving the 
equation X + A*X~^A = Q have been proposed in [5,8]. The iterative posi- 
tive definite solutions and the properties of the equation X + A*X~^A = I have 
been discussed in [6]. The general nonlinear matrix equation X+A*T{X)A = Q, 
where J- maps positive definite matrices either into positive definite matrices or 
into negative definite matrices, and its iterative positive definite solutions have 
been investigated in [3]. The notation Z > Y (Z > Y) indicates that Z — Y 
is positive definite (semidefinite) . The cases when the operator T{X) is mono- 
tone (if 0 < A < y then A(A) < X(Y)) or anti-monotone ( if 0 < A < F 
then A(A) > T{Y)) are considered. For instance, the operator A(A) = A*" is a 
monotone one for 0 < r < 1 and anti-monotone for r = — 1 [7]. 

We derive some necessary and sufficient conditions for solutions of (1). Zhan 
and Xie [9] have derived necessary and sufficient conditions for the matrix equa- 
tion A -I- A*X~^A = I to have a positive definite solution. In this paper we 
extend these conditions for the general equation (1). Two iterative processes for 
computing a positive definite solution of the equation (1) are studied. 

The following notations are used the paper. The p(A) is the spectral radius 
of A. We denote by ||.|| the spectral norm. The notation Y = \/Z means that 
Y, Z are positive definite and Z = F". We can compute F in the following way. 
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Since Z is a positive definite matrix we have Z = UDU* where U is unitary 
and D is diagonal with positive entries. Hence, Y = U y/DU* . If F > Z > 0 we 
have that > ^fz > 0 where n is a positive integer since \/Y is a monotone 
operator [7]. 



2 Necessary and Sufficient Conditions 



In this section we discuss positive definite solutions of the equation (1) where A 
is a real matrix {A* = A^). 

Theorem 1. The equation (!) has a solution X if and only if the matrix A has 
the decomposition 



VW'^Z, n = 2fc+l, fc = 0,l,... 

FF, n = 2k, fc=l,2,... (2) 



where V = {W^W)^ and W is a nonsingular square matrix and the columns of 
^ are orthonormal. In this case X = IF^IF is a solution and all solutions 
can be obtained in this way. 

Proof. If (1) has a solution X > 0 then we can write X = IF^IF where IF is a 
nonsingular square matrix. We rewrite the equation (1) as 




where Z = 
Hence 



IF'^IF + A^{W'^W)-^A = I 
W'^W+Z^Z = I, 

W-^{W-^W~'^)’^A, n = 2fc + l,/c = 0,l,... 
(IF-ilF-^)'=H, 



= 2k, 



k = l,2,... ■ 



A = 



fVW^Z, n = 2k+l, k = 0,l,... 

[VZ, n = 2k, k = l,2,... 



and columns of 



are orthonormal. 



Conversely, assume A has the decomposition (2) and X = W^W. Then 



_ JlF^IF+F^IFF^(IF^IF)-"FIF^F, n = 2k+l 
“ |lF^IF+F^F^(IF^IF)-"FF, n = 2k 

= W'^W + Z^Z = I, 



since V = (IF^IF)^. 

Hence X is a solution. □ 



Theorem 2. The equation (!) has a solution if and only if there exist orthogonal 
matrices P and Q and diagonal matrices 0 > 0 and X > 0, such that 0^+X^ = I 
and A = P"^ O^QEP. In this case X = P^O^P is a solution. 
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Proof. Assume the equation ( 1 ) has a solution X . From the Theorem 1 it follows 
that the matrix A has the factorization: 



A = 



VW'^Z, n = 2fc + l, fc = 0,l,... 
VZ, n = 2k, fc = l,2,... 



where V = {W'^W)^ and W is a, nonsingular square matrix and the columns of 

VF' 






are orthonormal. In this case X = W^W. 



We extend the matrix 



to an orthogonal matrix ^ ^ ^ 



\Z H 



The matrix can be written as [4] 



fwu\_fUiO \ fe -E\ f PO \ 

\zh)~\o U2)\se 



where Ui,U 2 ,P and i ?2 are orthogonal matrices, 0 and S are positive semidef- 
inite which satisfy 0^ + 27^ = J, W = UiOP and Z = U 2 SP. Since W is 
nonsingular then 0 > 0. Define 



Q 



U[U2, n = 2fc + l, fc = 0,l,... 
PU 2 , n = 2k, fc = l,2,... ■ 



Hence A = P^O’^QXP. 

Conversely, suppose A = P"'"0^QSP where P and Q are orthogonal, 0 and 
E are diagonal matrices and 0 > 0, A > 0, 0^ + 27^ = /. 

For X = P^O^P we obtain 

P^02p + P^I7Q^0"PP^0-2"PP^0"Qi7P = P"^02p + p'^p2p ^ j 

Hence X = P^O^P is a solution of the equation (1). □ 

Theorem 3. If the equation (!) has a solution then ||A|| < 1. 

Proof. If (1) has a solution by Theorem 2 we obtain that there exist orthogonal 
matrices P and Q and diagonal matrices 0, E such that 0^ + E^ = I and 
A = P^O^QEP. Compute 

Pll = ||P^0-QI7P|| < ||P^||||0"||||Q||||27||||P|| = ||0"||||i7||. 

Since 0 is a diagonal nonsingular matrix and 0^ + E^ = I, 0 > 0, 17 > 0 we 
obtain ||0"|| = ||0||", ||I7|| < 1 and ||0|| < 1. 

Hence II All <||0"||||I7|| = ||0ni7||<l. □ 



Theorem 4. If the equation (!) has a solution X then 



380 



Vejdi Hassanov and Ivan Ivanov 



(i) I>X> 

(ii) I - A^A-v^aP>0, 

(iii) p(A) < ^ , 

(iv) p{A + A^) < 1, 

(v) p{A — A^) < 1. 

Proof. Further on, we use A(A) to denote the set of eigenvalues of A. 

((i) :) Since we discuss positive definite solutions of the (1) then X > 0 
and A^X~'^A > 0. Hence X < I and A^ X~'^A < /, respectively. From 
Theorem 1 it follows that the solution has the type X = W'^W and 



A = 



{W^WfW^Z, n = 2fc+l, fc = 0,l,... 
{W^WfZ, n = 2k, k=l,2,...- 



Then 

X- Vaa^ 

Since A(ZZ^) = 



I W^W- ^{W'^W)’^W'^ZZ'^W{W'^WY, n = 2k+l 
j W'^W - ^{W^W)^ZZ^{WTW)'^, n = 2k 

\{Z'^ Z) and I — Z"^ Z = > 0 we obtain ZA^ < 1. 



{W'^WAW'^ZZ'^WiW'^WA < w'^w, 

^^{W^WAZZT{W^WA < w^w. 



Hence X — A AA^ > 0. 

((ii) :) Using ((i)) we have X < I, X~^ > I and X > A A A^ . Thus 
0 = X + A^X-^A - I > AAA^ + A^X~^A -I > VAA'^ + A^A - I. 
((iii) :) For the eigenvalues of A we have 

A(H) = A(P^6>”QA7P) = A(6>”Qr) = A(QU0”). 

Moreover 

p(A) = max|A(QU0'^)| < ||QU0”|l = 

Assume S = diag{ai}, 0 = diag{9i}. Then cTi > 0, 9i > 0 and erf + = 1- We 

obtain 



p{A) < ||if0”|| = max \<Ji9'f\ = max ai{l - erf) = 

i i 



< max x{\ — x^) 2 

xG[0,l) 



(n + 1) 



n+1 ■ 
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( (iv) : ) Consider I ± + A) in cases n = 2fc + 1 and n = 2k. 

I±{A'^ + A) = Z^Z + W^W ± Z^W{W^W)^ ± {W^W)'‘W^Z 

> Z^iWW^f^Z + W^W ± Z^W{W^W)'‘ ± {W^W)'^W^Z 
= {W± W{W^W)’^-^W^ zf {W ± W{W^W)'^-^W'^Z) > 0. 

I±{A'^ + A) = Z^Z + W'^W ± Z'^{w'^wf ± {w'^wfz 

> Z'^ z + w'^w ± z'^{w'^wf ± {w'^wfz 

= {W± W{W^Wf-^Z)'^{W ± W{W'^Wf-^Z) > 0. 

Since \{WW'^) = \{W'^W) and I — W'^W = Z'^ Z > 0 it follows WW'^ < I. 
Hence p{A'^ + H) < 1. 

( (v) : ) Consider p{A — A^) in cases n = 2k + 1 and n = 2k. 

p{A - A'^) = p{{W'^W)'^W^Z - Z^WiW^W)'^) 

< p{{W'^W)^'‘+^ + Z^Z) < p{W^W + Z'^Z) = 1, 
p{A - A^) = p{{W'^WfZ - Z'^iW'^Wf) 

< p{{W^Wf’^ + Z'^Z) < p{W'^W + Z'^Z) = 1. 

Hence p{A — AA) <1. □ 



3 Iteration Methods for Solving the Equation 

We consider iterative processes for solving the equation (1). Conditions for con- 
vergence of the iterative algorithms are given. 

Consider the iterative method 

Xo = 7t ^.+1 = 1- A*X-^A, (3) 



where 7 S 

Theorem 5. If there exist numbers a and (3 such that 

(i) ^ < a < /3 < 1; 

(ii) /3’"(1 -/?)/< < a’^(l - a)/. 

Then the iterative process (5), with a < j < P, converges to a positive definite 
solution X of (!) with linear convergence rate and X < I . 

Proof. We shall show the matrix sequence {X^,} is a Cauchy sequence and for 
each Xs we have al < Xg < pi. 

Suppose Xo = 7 /, {a < "f < P). Obviously al < Xq < pi. We get 

rv” A* A d” 

aI<I--{l-a)I<X, = I-^<I-f^il-P)I<pi. 
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Hence al < X\ < [31. Assume al < Xg < [31. We obtain 





Ag-" 


< —I 


^”(1 [3)I<^*^< 


A*Xf^A 


A* A 

< < — 1 

- Q,n - 


/-/3/< 


A*Xf^A 


<I-aI 


al <I 


- A*Xp^ 


A<PI. 


Thus al < Xg < (31, s = 0, 1: 


, 2 ,.... 





- A, = - A7")A = A*X^{X^ - X^_,)X:^,A 

= A*X;'^[X^-\Xg - A,_i) + a;- 2(A, - A,_i)A,_i + . . . 

+A,(A, - A,_i)a;_-2 + (X, - Xg_^)x:z^]X:^^A 
= A*X-\Xg - Xg_^)Xj33,A + . . . + A*A7"(A, - A,_i)A7_\ A 
||A,+i-A,|| < ||Af [||A7i||||A7_'\|| + ... + ||A7"||||A7_\||] ||A,-A,_i|| 









a 



n+l 



Since a > then q = < 1. Hence ||A^+i - As|| < g'’||Ai - Ao||. 

Moreover 



||W,+p - A,|| < (zlAp - Aoll < (||Ap - Ap_i|| + . . . + ||Ai - Ao||) 

< + • • • + l)ll^i - ^o|| < t^||Ai - Ao||. 

1-q 

Since q < I then lims^oo -^z- = 0. 

Consequently the {Ag} is a Cauchy matrix sequence. Since is a Banach 
space then {Ag} converges to a positive definite solution of the (1). □ 

We consider the second iterative process 

Fo = a, = ^A{I-Yg)-^A\ (4) 



where e G [ 0 , 

Theorem 6. If there exist numbers rj and 7 such that 

(i) 0<77<7<;7^; 

(ii) 77"(l-?7)/< AA* <7"(1-7)J. 

Then the iterative process (4), with ^ = 7 and ^ = 7 , converges to a positive 
definite solution Y of (1) and Y < 

Proof. Consider the case Yq = ^I {^ = rj). We shall show that the matrix 
sequence {Fg} is a monotonically increasing sequence and for each Fg we have 
ijl <Yg < 7 /. 
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According to (ii) we have 
rjl = 



I ?7"(1 — rj ) 



1 - 77 



/ < 



AA* 

1 - 77 



= id < 



' AA* 

1-7 



< 



1 7”(1 — 7 ) 
1 - 7 



1 = 7/. 



Hence ho < Yi < 7 /. Assume that Yg-i <Yg< 7 /. 
Thus 



y,+i = ^A{I-Ys)-^A* > ^A{I-Ys-i)-^A* = Y,. 

Since Yg < 7 / it follows A{I — Yg)~^A* < = 7 "/. Hence 

Ys < Yg+i < 7 / for s = 0, 1 , • ■ □ 

Remark 1. li Yq = it can be proved that Yg is a monotonically decreasing 
sequence. 

Theorem 7. If the equation (1) where A is real has a positive definite solu- 
tion Y, then the iterative process (4) where Yg = ^ AA"^ converges to the small- 
est positive definite solution Ymin- 

Proof. We shall show that the sequence {Tg} is a monotonically increasing one 
and bounded above from any positive definite solution Y. We have 
0 < I — AA^ < / since J > V AA^ by Theoremd. Then (J — V AA^)“^ > I 
and A{I — \J AA^)“^A^ > AA^. We can write 

Fi = ^A{I-Yq)-^AT > To = 



We assume that Yg > Yg-\. It easy to show that Tg+i > Yg. 

Hence the sequence {Tg} is monotonically increasing. 

Let y be a any positive definite solution of the equation (1). From Theorem 4 
(i) we obtain Y > ^ AA^ = Yq. 

We suppose that Yg < Y. We shall prove yg+i < Y. We have 

(/-yg)-' < (i-Y)-^ 

yg+i = ^A{i-Yg)-^A^ < ^A(/-y)-iA^ = y. 

Hence yg+i < Y for all s and each positive definite solution Y of the (1). 
Thus the {yg} converges to the smallest positive definite solution Ymin of the 
equation ( 1 ). □ 

4 Numerical Experiments 

We carry out numerical experiments for computing the positive definite solutions 
of the equation (1) where = 3 in MATLAB on a PENTIUM computer. We 
use considered methods (3) and (4). As a practical stopping criterion we use 
e=\\Z + A^Z-^A - J||oo < tol and tol = 10"®. 
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Example 1. Consider the matrix 




16 -9 -8 
II 16 5 
4 -8 18 



We compute the solution X using the method (3) with different values of 
7 and the method (4) with different values of The method (3) (Xq = 7 /) 
with 7 = 1 it needs 8 iterations and for 7 = 0,955 it needs 7 iterations. In case 
7 = 0, 951 it needs 7 iterations and for 7 = 0, 75 it needs 10 iterations. The 
method (4) {Yq = ^I) with ^ = 0 it needs 13 iterations and for ^ = 0,403 it 
needs 13 iterations. In case ^ = 0,414 it needs 13 iterations and for ^ = 0,75 it 
needs 14 iterations. 

5 Conclusion 

In this paper we introduced the general nonlinear matrix equation (1). We have 
studied some properties and two recurrence algorithms. The recurrence equation 
(3) defines the monotonically matrix sequence {Xg} (7 = 1) for the equation 
X + A* X~^ A = I [2] which has a limit. It is proven that this limit is the 
largest positive definite solution. We expect that the matrix sequence {Xg} (3) 
converges to the largest positive definite solution of this general equation. The 
matrix sequence {Tg} converges to the smallest positive definite solution of ( 1 ) 
for the special initial point. 
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1 Introduction 



Let ak,bk (fc = 0, 1, . . . , 2n) be given real numbers, where = 0 bk ^ 0 for 
fc > 0, and e_i = 0, eo(f) = 1. Then the three-term recursion 

defines a system of polynomials ej{t) = (j = 0,1,..., 2n), where 

€jj yf 0. We introduce the matrices En = [ey]"^-^Q with eij = 0 for i > j. 

In this paper we consider (n-l-1) x (n-l-1) matrices of the form Rn = E^HnEn 
where H = \ hj+kYj ^ Hankel matrix. We call matrices of this form OP- 

Hankel matrices, where “OP” stand for “orthogonal polynomials” . This name 
should point out that orthogonal polynomials satisfy a three-term recursion (1). 

Let us mention some instances where OP-Hankel matrices appear. The most 
familiar one seems to be modified moment problems. In fact, for orthogonal 
polynomials {ej(t)} on the real line, the modified moment matrices with entries 



rij = J^i{t)ej{t)da, 



( 2 ) 



where a is some (not necessarily positive) measure on the real line, are OP- 
Hankel matrices. Some general references for this are [5], [4], [3]. Then OP-Hankel 
matrices appear in least square problems for OP expansions. In this connection 
OP-Hankel matrices were introduced and studied in [8]. In [2] OP-Hankel ma- 
trices were used for preconditioning of ill-conditioned Hankel systems. This is 
based on the remarkable fact that positve definite OP-Hankel matrices can be 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 385—392, 2001. 
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well conditioned, whereas positive definite Hankel matrices are always ill condi- 
tioned. Finally, in [6] it is shown that symmetric Toeplitz matrices and, more 
general, centrosymmetric Toeplitz-plus-Hankel matrices are unitarily equivalent 
to a direct sum of two special Chebyshev OP-Hankel matrices. 

An inversion algorithm for nx n OP-Hankel matrices with complexity 0{n^) 
was, as far as we know, first presented in [8]. More algorithms with this com- 
plexity are contained in [3]. The algorithms in [8] and [3] are Levinson-type 
algorithms. The disadvantage of Levinson- type algorithm compared with Schur- 
type algorithms is that they cannot fully parallelized and speeded up to superfast 
algorithms. 

A Schur-type algorithm and the corresponding superfast O(nlog^n) com- 
plexity solver for the special case of Chebyshev-Hankel matrices is presented 
in [6]. This also leads to a superfast Toeplitz solver based in real arithmetics. 

In this paper the approach from [6] is generalized to arbitrary OP-Hankel 
matrices. The basic fact of our approach is that OP-Hankel matrices can be de- 
scribed in 3 different ways: Firstly, they are matrices of the form 
secondly they are matrices for which the “displacement” — T„i?„ has 

rank 2 and a special structure, where T„ is the tridiagonal matrix defined in Sec- 
tion 2, and finally they are the matrices of restricted multiplication operators 
with respect to the basis {ek{t)}. 

The last interpretation enables us to derive immediately Levinson- and Schur- 
type algorithms for LU- factorization of strongly nonsingular OP-Hankel matri- 
ces and their inverses. The combination of the Levinson and the Schur-type 
algorithms can be used to speed up the algorithm for the solution of OP-Hankel 
systems to complexity 0(n log^ n). 



2 Displacement Structure 



Throughout the paper, let denote the (m -I- 1) x (m -|- 1) tridiagonal matrix 



T„ 



ao bi 
bi ai 

■ bjn 

bm 



We consider the commutator (or displacement) transformation Vi?„ = i?„T„ — 
TnRn- Since all eigenvalues of Tn are simple, the kernel of V has dimension n-|-l. 
Furthermore, can be reproduced from and the first or the last column 
of Rn- In fact, the following is easily checked. 

Proposition 1. Let denote the (k + 1) th column of Rn and tk the (fc -I- 1) 
th column of VRn {k = 0, . . . ,n + 1), then 

Vk+i = 7 -^ {{Tn-akln)rk-bkrk-i+tk), rk-i = ^ {{Tn-ak)rk-bk+irk+i+tk), 
Ok+l Ok 

with r_i = r„+i = 0. 
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Let Tin denote the space of all matrices Rn for which Vi?„ has the form 

Vi?„ = gel - Cng'^ (3) 

for some g € lET^^. Obviously, we may assume that the last component of g is 
zero. We shall show that Tin is just the set of all OP-Hankel matrices corre- 
sponding to the data and bk- For this we mention first that from the fact that 
the kernel of V has dimension n + \ and g has n degrees of freedom it follows 
that 

dimTfn < 2n -I- 1. (4) 

Next we observe the following. 

Proposition 2. An OP-Hankel matrix Rn = E^HnEn belongs to Tin and (5) 
holds with 

g — bnX n—l (Fn andn+l^'^n- 

Proof. We may extend Hn to an infinite Hankel matrix Hao- For the cor- 
responding OP-Hankel matrix i?oo we have RooToo = TooRoo- Taking the first 
n -I- 1 rows and columns of this relation we obtain the assertion. □ 

Since the dimension of the space of all (n-l-1) x (n-l-1) Hankel matrices 
equals 2n -I- 1, the space of all (n -I- 1) x (n -I- 1) OP-Hankel matrices also equals 
2n -I- 1, the following is true. 

Corollary 1. Any matrix Rn € Tin admits a representation Rn = E^HnEn, 
where Hn is a (n -I- 1) x (n -I- 1) Hankel matrix. 

We give now a third characterization of OP-Hankel matrices. Let denote 
the space of all polynomials of degree less than or equal to n with real coefficients 
and Vn the projection defined for polynomials x(t) = ^kS-kit) by Vnx{t) = 

^kGkit). Note that Vnt’^ = 0 if fc > 2n. Furthermore, for a given polynomial 
x{t), let [x{t)]k denote its coefficient in its expansion by {ek{t)}, i.e. if x{t) = 
J2k=o^kek{t), then [x{t)]k = Xk. 

For a given polynomial p{t) of degree less than or equal to 2n, let TZnip) 
denote the operator in defined by TZn{p)x{t) = VnP{t)x{t). For p{t) = t we 
set Sn := Tln{p)- 

Proposition 3. 

{Un{p)Sn - SnTln{p))x{t) = [p(t)x(t)] „+i (t) - 6„+i (g(t)[x(t)]n) , 

where g{t) = Pp{t)en+i{t). 

The proof is a straightforward verification. 

Let Rn{p) denote the matrix of the operator TZn(p) with respect to the basis 
{ekft)}. In particular we have i?n(l) = In and Rn{t) = Tn. Furthermore, 

In' 

0 



Rn{f) = [InO]p{TN) 
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for any N > 2n, and the relation in Proposition 2.2 can be written in the form 

Rn{p)Tn '^nRnip) — ^n+lidn^n ^ri9n)^ (^) 

where g is the coefficient vector of g{t) with respect to expansion of g{t) by 

That means the matrices TZn{p) belong to the class Hn and are, therefore, 
OP-Hankel matrices. Since the mapping p{t) — > Rn{p) is one-to-one for p{t) G 
the dimension of the space of matrices Rn{p) equals 2n+l. This leads to 
the main result of this section. 

Theorem 1. For an (n -|- 1) x (n -I- 1) matrix R, the following are equivalent: 

1. The matrix R„ is of the form = E^HnEn for some Hankel matrix FI . 

2. The commutator Vi?„ satisfies (5) for some g € 

3. For some polynomial pit) G IF^n[t]) Rn = Rn{p)- 

If Rn is given in the form i?„ = E'^HEn with H = [hi+j]2j=o, then the 
coefficient vector p of p{t) with respect to the basis {ek{t)} is given by p = 
where h = [hkllT^. If Rn is given by (2) then the coefficients of p{t) are the 
numbers (i = 0, . . . , 2n). 



3 Algorithms for LU-Factorization 



In this and the next sections we consider only strongly nonsingular OP-Hankel 
matrices R„ = Rn{p) = That means we assume that the principal 

subsections [ ]i are nonsingular for k = 0, . . . ,n. This covers, in particular, 
the case when i?„ is positive definite. 

We seek fast algorithms for the LU-factorization of i?„ and its inverse. More 
precisely, we are looking for an upper triangular matrix C/„ = [uij and a 

lower triangular matrix L„ = [hj satisfying 

RvJJn — -hn and Uii — 1 (z — 0,...,?r). (1) 

In polynomial language this can be written in the form 

p{t)uk{t) = lk{t), (2) 



where 



Theorem 2. 

sion 



k 2n 

^ ^ 1 ^ ^ ^ik^i (^) • 

2=0 i—k 

The columns of Un and Ln in (!) can be computed via the recur- 



bk+iUk+i{t) = {t- ak)uk{t) - j3kUk-i{f) 

bk+llk+l{t) = {t — Otk)lk{t) — Pklk-l{t), 



where fc = 0, . . . , n — 1, 



Pk 



bklkk 

lk—l,k—l 



bklkklk-l,k — bk+llk,k+llk-l,k-l 
lkklk—l,k—l 



Oik = 
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This theorem can be proved by straightforward verification. 

The initial polynomials uo(t), lo{t) are given by uo(t) = 1 and lo(t) = p{t). 
The recursions can easily be translated into vector language using the fact 
that matrix of the operator of multiplication by t with respect to the basis 
{ek{t)} is equal to T 2 „. 

The algorithm emerging from the theorem is a hybrid Levinson-Schur type 
algorithm. It is in particular convenient for parallel computation and has 0(n) 
complexity if n processors are available. 

It is possible to calculate only the columns of the upper factor Un, and the 
quantities hj for 0 < j — f < 1 as some inner products of rows of R„ and the Uk- 
This leads to a Levinson type algorithm. It is also possible to calculate only the 
lower factor L, which is results in a pure Schur type algorithm. In this case the 
solution of a system RnX = b will be obtained by backward substitution. 

4 OP-Bezoutians 

Apparently it makes no sense to ask for LU-factorization algorithms that require 
less than O(n^) operations, but it makes sense to ask for such algorithms to solve 
systems of equations. In the case of Hankel matrices one can make use of the 
fact that inverses of Hankel matrices are Bezoutians, which are matrices [bjk] 
such that the “generating function” B{t,s) = J2j equals {u{t)v{s) — 

v(t)u{s)) / {t — s), where u{t), v{t) are certain polynomials. Onces u and v are 
given, a Hankel system can be solved by matrix-vector multiplication which can 
be carries out with 0(n log n) complexity if FFT is used. 

This leads us to the definition of OP-Bezoutians. For our given system of poly- 
nomials E = {ej(t)} and a given matrix B = [bjk] we define the “i?-generating 
function” 

BE{t,s) = y^bjkej{t)ek{s). 

A matrix i? is called an E-Bezoutian if BE{t, s) = {u{t)v{s) — v{t)u{s))/{t — s). 
Since OP-Hankel matrices admit a representation i?„ = E[^HnEn, we conclude 
the following. 

Proposition 4. Inverses of OP-Hankel matrices are OP-Bezoutians. 

Let us mention that the polynomials u(t) and v(t) are, up to a constant factor, 
equal to Un{t) and Un+i{t) introduced in the previous section. That means, in 
order to solve systems of equations with the coefficient matric i?„ it is sufficient 
to store these two polynomials. 

5 Fast Polynomial Multiplication 

In order to obtain algorithms for the solution of systems with a OP-Hankel 
coefficient matrix we need an algorithm for fast multiplication of polynomials 
in OP-expansions. For this we can use the approximative algorithms from [1], 
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but we can also use the exact algorithms described in [13]. In the latter paper 
(2iV + 1) X {N + 1) matrices of the form 

Vn = 

with Cj = cos ^ are considered and algorithms are presented that multiply a 
vector by Vn or by with complexity 0(iV log^ N) and resonable accuracy. 
We need the following property, which is mentioned in [13]. 

Proposition 5. Let w be the first column of the inverse of the matrix Vn = 
[efc(cj) ]|^^Q, and let be the diagonal matrix D^, = diag [iCj]|^g. Then 



VnDwVn = In+1- 



Once the weight vector w is precomputed, it is clear how to multiply two 
polynomials. First we chose N > 2(m + n) and multiply the matrix Vn by 
coefficient vectors of x{t) and y(t), which means that the values of x{t) and y{t) 
at Cj are computed. Then the computed values are multiplied by each other and 
by uij, and the V’^ is applied to obtain the coefficient vector of the product in 
the expansion by {ek{t)}. 

Let us note that the algorithms briefly sketched in this section can also be 
used for fast matrix-vector multiplication by OP-Bezoutians. That means if the 
data in the OP-Bezoutians are given, then a, n x n system with an OP-Hankel 
coefficient matrix can be solved with complexity 0(nlog^n. This complexity 
reduces to 0(n log n) in the case of Chebyshev polynomials. 



6 Superfast Algorithm 



We show now how an algorithm with complexity O(nlog^n) to find Un(t) and 
Un+i{t), which are required for the solution of OP-Hankel systems, can be de- 
signed. We introduce 2x2 matrix polynomials 



Uk{t) 



Uk{t) Uk-l{t) 
lk(t) lk-l{t) _ 



Ok{t) 



1 

bk+1 



t — otk bk+i 
-Pk 0 



Then the relation in Theorem 3.1 can be written in the form 



Uk+i{t) = Uk{t)0k{t). 



( 3 ) 



We define, for j > k 



&kj{t) = Ok{t)Ok+i{t) . . . Oj-i{t). 



Then, for j > i > k, 



Okj{t) = Oki{t)Oij{t), Uj{t) = Uk{t)Okj{t). 



( 4 ) 
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In order to achieve complexity O(nlog^n) it is important to carry out the 
calculations not with the complete polynomials lk(t) but only with the relevant 
part of them. We define 

j 

m = E ^ki^i (^) • 

i=k 

It is easily checked that Ok,k+i{t) can be computed from and 

Ok,k+ 2 {t) from and lk^^{t) and, in general, 0kj(t) from and 

^k~^~^{t). Furthermore, the following is true for k < i < j: 



j2j-i-2 

H-l 



it) m 



['P2j-i-2hi-l{t) P2j-i-lhi{t)] , 



( 5 ) 



where 



[hi-i{t) hi{t)] 



^i-iit) liit) 



0ki{t). 



This leads to the following recursive procedure. 
Input: ll^~’^~^{t)] , Output: 0kj{t) 



(6) 



1. If j = fc + 1 then apply Theorem 3.1. 

2. Otherwise choose i with k < i < j and carry out the following steps: 

(a) Apply the Procedure for [lk^~^~‘^it) lk^~i~^it)]. The output is Oki{t). 

(b) Compute [t) by (5) and (6). 

(c) Apply the Procedure for [l{{t) The output is Oij{t). 

(d) Compute 0kj{t) = 0ki{t)0ijit) using a fast algorithm (as described in 
Section 4). 



It is convenient to choose i close to the average of j and k. Proceeding in 
this way the problem to compute 0kj (t) is reduced to two subproblems of about 
half the size plus 0((j — k) log^(j — k)) operations for polynomial multiplication. 
This ends up with complexity 0((j — k) log^(j — k)). In particular. Unit) can be 
computed with O(nlog^n) operations. 



7 Other Approaches 

Let us briefly mention some other approaches to solve linear systems with a 
OP-Hankel coefficient matrix. The first one is described in [9]. It is based on 
displacement structure and Schur complements and applicable to matrices R for 
which the rank of TiR — RT 2 , where T\ and T 2 are tridiagonal matrices, is small 
compared with the size of the matrix R. This approach, however, does not fully 
use the specifics of OP-Hankel matrices. 

The second approach is based on transformation into Cauchy-like matrices 
(see [7]) or into a tangential interpolation problem (as in [6] for Chebyshev- 
Hankel matrices) and the application of the algorithm described in [12]. For this 
the eigenvalues and eigenvectors of the matrix T have to be precomputed. 
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Finally, a convenient basis change transforms a general OP-Hankel matrix 
into a Chebyshev-Hankel matrix. The basis change can be carried out with the 
help of the algorithms described in [13] with O(nlog^n) complexity. For the 
resulting Chebyshev-Hankel system one could use the O(nlog^n) complexity 
algorithm described in [6]. This leads to a O(nlog^n) complexity algorithm for 
general OP-Hankel systems. However, it is possible that the change of the basis 
increases the condition number of the matrix essentially so that the numerical 
application of this approach might be restricted. 
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Abstract. For singularly perturbed convection-diffusion problems with 
the perturbation parameter e multiplying the highest derivatives, we con- 
struct a scheme based on the defect correction method and its parallel 
variant that converge e-uniformly with second-order accuracy in the time 
variable. We also give the conditions under which the parallel computa- 
tion accelerates the solution process with preserving the higher-order 
accuracy of the original schemes. 



1 Introduction 

For several singularly perturbed boundary value problems, e-uniformly conver- 
gent finite difference schemes have been constructed and analyzed (see, e.g., [1]- 
[5]). The time-accuracy of such schemes for nonstationary problems usually do 
not exceed first order. The use of a defect correction technique allows us to 
construct e-uniform numerical methods with a higher order of accuracy in time 
(see e.g., [6,7]). Parallelization of the numerical method based on decomposition 
of the problem makes it possible to solve the discrete problem on a computer 
with several processors that may accelerate the computational process. However, 
this parallel process introduces additional errors in the numerical solutions. If 
the numerical method is accurate in time with order more than one, then the 
errors introduced by the domain decomposition (DD) can essentially exceed the 
discretization errors. Therefore, it is necessary to construct the parallel method 
such that the computation time is essentially less, and the accuracy is not lower 
than those for the corresponding nonparallel method. 

* This research was supported in part by the Netherlands Organization for Scientific 
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In the case of singularly perturbed problems er-uniform parallel schemes based 
on the defect correction principle were studied in [8]. Parallel methods that 
allowed us to accelerate the numerical solution of the boundary value problems 
for parabolic reaction-diffusion equations on an interval were developed in [9,8]. 

In the present paper we consider the Dirichlet problem for a singularly per- 
turbed convection-diffusion equation on a rectangle in that case when character- 
istics of the reduced equation are parallel to the sides of the rectangle. In this 
case regular and parabolic layers appear for e — *■ 0. To solve the problem, we 
construct an £-uniform scheme based on the defect correction method and its 
parallel variant convergent (er-uniformly) with second-order accuracy in time. We 
also write out the conditions under which the parallel computation accelerates 
the solution process without losing the accuracy of the original schemes. The 
technique for analysis of difference schemes is similar to that given in [8]. 

2 Problem Formulation 

On the domain G = x (0, T], D = (0, 1) x (0, 1), with boundary S = G\G, we 
consider the Dirichlet problem for the singularly perturbed parabolic equation 

{ d‘2 Q g ^ 

X! + bi{x, t) — c(x, t) - p{x, t)—'>u{x,t) = 

= f{x,t), {x,t)eG, (la) 

u{x,t) = p{x,t), {x,t) G S. (lb) 

Here as{x,t), bi{x,t), c{x,t), p{x,t), f{x,t), (x,t) G G, and p{x,t), (x,t) G 
S are sufficiently smooth and bounded functions, moreover, as{x,t) > oq > 
0, 6i(x, t) >bo > 0, p{x, t) > po > 0, c(x, t) > 0, (x, t) G G; e G (0, 1]. 

Let S = Li So, So = So- We distinguish four faces in the lateral bound- 
ary S^: S^ = Lij^i Sj, Sj = Fj X (0,Tj, where A, A, A and A denote the 

left, bottom, right and top sides of the rectangle D respectively. 

When the perturbation parameter e tends to zero, regular and parabolic 
layers appear respectively in the neighborhood of the boundaries Si and A, A- 

3 Special Finite Difference Scheme 

On G we construct the piecewise uniform grid (see, e,g, [10,3]) 

Gh = Dh X Wo , Dh = wi X A ■ (1) 

Here wq is a uniform mesh on [0,T] with step-size r = T/Nq, lJs = ills(o's), 
s= 1, 2 is a piecewise uniform mesh with Ns intervals on the Xj-axis. To construct 
the mesh W 2 (c 2 ), we divide [0, 1 ] in three parts [ 0, CT 2 ], [ct 2 , 1 — ct 2 ], [ 1— <72, 1 ]; we 
take tT 2 = min[ 1/4, 77i2elniV2 ]. In each part we place a uniform mesh with N 2/2 



Throughout this paper we denote by M, (or rrii, m*-d) arbitrary, sufficiently 

large (small) positive constants independent of e and the discretization parameters. 



Parallel High-Order Time-Accurate Schemes 



395 



elements in [ ct 2 , 1 — ct 2 ] and with N 2 / 4 elements in each subinterval [ 0 , ct 2 ] and 
[1 — 172, !]■ When constructing aJi(cri), we divide [0,1] in two parts with the 
transition point cti = min[l/2, In iVi ], where 0 < mi < m^, = 

miiig [a]f^(x, t)j. We place a uniform mesh in [0, cti], [cti, 1] using A^i/2 

mesh elements in each subinterval. 

For problem (1) we use the difference scheme [11] 

{x,t)£Gh, z{x,t) = {x,t) € Sh, (2) 

where ^(2) — dsi^Xj t^SxsXS + bi{x, t)Sxi - c{x, t) - p{x, t)6j, Sjz{x, t), 

s=l,2 

5x1 z{x,t) and 5^^s z{x,t) are the first and the second differences of z{x,t). 

Theorem 1. The solution of finite difference scheme {2), (!) converges e- 
uniformly to the solution of (!) with an error bound given by 

\u{x,t) - z{x,t)\ <M{NfhnNi+Nf:'^hi^N 2 + T), (x,t) eGh- 

Remark 1. Let u G j3 = iF-|-2-|-a, A" > 0, a > 0. Then the derivatives 

{d^° /dff°)u{x,t) and the divided differences z{x,t) satisfy the estimates 

< , (x, f) G G, fco < a: -h 2; (3) 

\5ij z{x,t)\ < M^^2y (x,t) G Gh, t>lr, 1<K + 1. (4) 

Here 5ijz{x,t) = (5;_it z{x,t) - 5;_it z{x,t- r))/r, (x,t) G Gh, t > It, I > 1, 
(5q( 2 (x,t) = z(x,t), and 5uz{x,t) denotes the backward difference of order 1. 






4 Parallelization of Finite Difference Scheme (2), (1) 

We derive the difference scheme to be solved on P > 1 parallel processors [8]. 

1. First we describe a partitioning of the domain D 

D = [jk=iD\ P'= = (0,l)x4, (1) 

where d^ are open intervals in (0,1) on the X 2 -axis. Let G^' = x (0,T], k = 
1, ... ,K. We denote the minimal overlap of the sets and = U^i i^k 
by 5^, and by 6 the smallest value of i.e., 

min p(x\x^)=(5, (2) 

k, , x^ 

x^gd'", x^gp''^', x\ x 2 ^ { P'^'npW } , k = l,...,K. 

In general, the value 5 may depend on the parameter e. 

Let each P^' be partitioned into P disjoint (possibly empty) parts 

D'^ = [J2=i k= I,..., K, pfnP" = 0, P^' = (0, 1) x 4^. (3) 
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We set X (0,T], p=l,...,P, k=l,...,K. 

Ai ^ 

We introduce the rectangular grids on each of the sets G and Gp : 

G^ G n G^(i) , Gp\ = Gp n G^(i) . 



(4) 



We define the prizm G(ti) with the boundary S{ti) = G(G) \ G(ti) by 

G{ti) = {{x,t) : {x,t)£G, ti<t<ti + r}, G, G + r S ZUo- 

Let the discrete function v{x,t\ti) be defined at the boundary mesh points 
Sh{ti) = ti e uJo- By v{x,t;ti) we denote the extension of this 

function to the grid set G/i(G) = G{ti)f^Gh- The ’’prizm” G/j(G) consists of 
only two time levels G/j(G) = {Dh x [t = G] } U {^h x [t = G + t] }. 

2. Before to describe the difference scheme designed for parallel implementa- 
tion on P processors, we assume that z{x,t) is known for t <t^. Then we solve 



Here the function (x,t) G G(t") defines the prolonged function 



(x,t) G G^f,{t^), 



(5a) 




for (x,t) £ Gpf^{t'^), k = l,...,K, G UJo, n < Nq — 1; 




for {x,t)GGhin, k=l,...,K, TGuJo. 



We define the function on the prizm G/j(G) by the relation 

Z(5 )(x,t) = {x,t) G Gh{f), f GuJo- (5b) 



The difference scheme (5) can be written in the operator form 



Q(5)(^(5)(2^>^); /(•)) '0(-)) = {x,t) G Gh- (5c) 




where 




(/3(x,t), (x,t) G Sh{t^), 

ip(x,t), {x,t) G Sh(t^) f] Sh, t> 
z{x,t), (x,t) G Sh(t^)\Sh, t = 




(5e) 



(x,t) G Sh{t"‘), n = 0,1, . . . , A^o - 1- 
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In the specific problem (5) we take ip{x, t; t”) = 0. 

Note that the intermediate problems in the discrete DD method (5), (4) 
are solved on the subsets = -Dp(3) n independently of each other (“in 
parallel” ) for all p = 1, . . . , P. 

Let the following condition be satisfied 

^ = <^(2)(^) > 0^ £G(0,1], inf [£-M(2)(e)] > 0. (6) 

e:G(0,lJ ^ ^ 

A technique similar to the one exposed in [6,7] gives us the error estimate 
\u{x,t)-z^5)ix,t)\<M{Nr^lnNi + N2^ln^N2 + N^^), (x,t)€Gh. (7) 



Theorem 2. Under condition (6) and for N, Nq — > oo, the solution of the dif- 
ference scheme (5), (4) converges to the solution of {!) s-uniformly. The estimate 
(7) holds for the solution of this difference scheme. 



5 Improved Time-Accuracy. Parallel Scheme 

1. Constructing the defect-correction difference scheme on we rewrite the 
finite difference scheme (2) as in [7]: 

= /(x,t), (xff)&Gh, z^'^\xff) = ip{xff), (x,t) £ Su, (1) 

where z^^\x,t) is the uncorrected solution. To find the corrected solution 
z^^^xff), we solve the problem 



(x, f) € Gh, 



+ 

II 

c7 






1 2"^p(a:,t)T^2t ' 


z^'^\x,t) = (p{x,t), 


(x,t) e Sh- 



(2) 



Here the derivative {d'^ /dff)u{x,0) is obtained from equation (la). 

In the remainder of this section we consider a homogeneous initial condition 



(p{x, 0) = 0, X £ D. 

Under this condition, for the solution of problem (2), (1) we have 



(3) 



\u{x,t)-z^‘^\x,t)\<M[NfHnNi + N2‘^ln^N2 + T^], {x,t)£Gh. (4) 

Proceeding in a similar way, one can construct difference schemes with a 
higher order of time-accuracy 0{ff), I > 2 (see [7,8] for I = 3). 

2. Let us consider a parallel version for the defect correction scheme. In the 
operator form the above difference scheme is written as follows 

f^^\-), p(-), = 0: (xff) G Gh, 



( 5 ) 
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where 

= f{x,t) + 

(2-^p{x,t)T{d^/dt'^)u{x,0),t = T, ] 

+ •{_-, N r nw [ 1 (2;,^) G G^, 

[2 ^p{x,t)TS2iz‘'^\x,t), t>2T} 

= z«(x,r+i)-z(i)(x,G), (x,t)eG^(r), t = t^+\ 

It is easy to see that z^^\x,t) = z^^. 4 ^(a:,t). 

Following the arguments from [6,7,9] we obtain the main convergence result. 

Theorem 3. Let condition (5) hold. Then, under condition (6), the solution of 
the difference scheme (5), (4) converges, as N, Nq 00 , to the solution of the 
boundary value problem (1) e-uniformly. For the discrete solution the estimate 
(4) holds. 



6 Acceleration of Computations by the Parallel Scheme 

To solve the problem (1), we use scheme (2), (1) with improved time-accuracy 
as the base scheme. One can also use the parallel variant of scheme (5), (4). We 
say that the use of parallel computations leads to the real acceleration of the 
solution process if such a scheme with P > 1 parallel processors can be found 
for which the computation time turns out to be smaller and the accuracy of the 
approximate solution is not lower than those for the base scheme. 

We shall consider the difference scheme for P parallel solvers on the meshes 

Gp?i = Gp n G/j , G/j = Dh X cuq , (1) 

where Dh = loq is a uniform mesh on [0,T] with the number of nodes 

Nq + 1 and the mesh step r^; generally speaking, ^ ^o(l)- 

1. We now describe the decomposition of the set D which can ensure the 
acceleration of the solution process. 

Let the domain D consist of J non-overlapping rectangles 

P<2>, j = l,...,J, (2a) 

where n = 0 for i^ j, D = Uf=i D"'^^ ■ J < M. On each of the 

sets G = D X [0,T], the mesh Gh with the given distribution of its nodes 

generates the meshes Gh = G n Gh, j = 1, . . . , J, Gh = Gh^)- For each 
of the sets we construct the rectangle containing together with 

some neighborhood. This set satisfies the three conditions: 
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(a) D contains the set of the points distant from D on the distance which 
is not smaller than Sq, where 



(5o = ''^(2) ^ with some fixed m 



( 1 ) 

( 2 ) 



(2b) 



(b) the sides of the set x [0, T] pass through the nodes of the mesh Gh\ 

(c) the number of nodes in each of the meshes O Dh is the same and 

it does not depend on the number j. 

Let the work time of the processors, assigned for resolving the discrete prob- 
lem on the level t = oi the mesh set Df^ from be defined by the value 

that is the number of nodes in the set 
The sets 

G^=D^x{0,T], j = (2c) 

form the preliminary covering of the set G, that is, G = Uj=i ■ Assume 

M:d 1) = /, M^l) = (l + mg)/r(:Df>), j = (2d) 

The sets (2c) are used for the construction of the special DD scheme (5), (1) 
with P processors. For this, we construct the sets 

k=l,...,K (3a) 



which cover the set G, where the value K = K{P) is chosen from the condition 
KP = J. The each of the sets G^^^ is multiply connected (for P > 1) and 
formed by the union of the P non-overlapping domains from (2c). Thus, for the 
subsets Gp which form the sets from (3a), the following condition holds: 

G^ C {G^ j = l,...,J}(2c), fc = l,...,K, p=l,...,P, (3b) 

where ^{Dpy) = G^^^ = Up=i ^p- With such decomposition the processors 

are loaded more effectively. 

2. By definition, we denote the work time, which is required to solve problems 
(2), (1) and (5), (4) respectively, by 



K 

d = d{No) = NoKDk), P) = NP Y, max ^(^J J- 



Then the rate of acceleraton for our computations is defined by 

_ _ -1 

G = G(iVo,<,P) = ^?(r^^)-i = iVo«)-i ^ max . 

^ k=l ^ 

3. We now give the conditions ensuring the acceleration of the solution process 
based on parallelization of scheme (2), (1). Here we assume that the derivative 
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{d^ /dt^)u{x, t) on the set G is not too small. Precisely, let the following condition 
hold 

^ ^ (x,t) e G" (4) 






on some set G = {(x,t) : x*^ < Xg < x* , 



^♦2 



= 1,2, , G* CG. 



In the case when the number P of processors is sufficiently large, i.e.. 



P>M(l + mg) (mg) 



/Vf(3) I n/fG) I 
M(3) + + m^3^ j 



— p* 



(5) 



the acceleration can be really attained for the numerical solution of the boundary 
value problem. In fact, the acceleration is achieved under the condition 



<=(l + mg) ^iVoP*. 



( 6 ) 



The value of G, which characterizes the attained rate of acceleration, is defined 
by 

G = P{P*)-\ P* = P*^^. (7) 



Theorem 4. Let conditions (5), {4)^ (4) hold for the solutions of the boundary 
value problem (!) and scheme (£), (J). Then, in the class of difference schemes 
(5), (!) for P parallel processors, e-uniform acceleration of solving problem (1), 
as compared to the base scheme {2), {!), can be achieved in general; in particular, 
for the decomposition (5), (!) the acceleration is achievable under condition {5). 
Moreover, for scheme (5), {3), (!) the acceleration is attained under conditions 
(5), (6), and the rate G of acceleration is defined by (7). 
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Abstract. In a recent paper [10], we described and analyzed a finite dif- 
ference discretization on adaptive sparse grids in three space dimensions. 
In this paper, we show how the discrete equations can be efficiently solved 
in an iterative process. Several alternatives have been studied before in 
Sprengel [16], where multigrid algorithms were used. Here, we report 
on our experience with BiCGStab iteration. It appears that, applied to 
the hierarchical representation and combined with Nested Iteration in 
a cascadic algorithm, BiCGStab shows fast convergence, although the 
convergence rate is not truly independent of the meshsize. 



1 Introduction 

Recently, the use of sparse grids has drawn considerable attention [4,6,7,10,11,16] 
because of its prospects for a very efficient treatment of higher dimensional 
problems. Most attention is directed towards the solution of three-dimensional 
partial differential equations, because of their importance for scientific and tech- 
nical problems. The contrast of sparse grids with the classical grids is the fact 
that on usual regular three-dimensional grids the number of gridpoints grows 
with 0{h~^) with decreasing mesh-width h, whereas the number of mesh-points 
grows with only 0{h~^ \ log h\'^) for sparse grids. For a solution, u, with sufficient 
smoothness, the loss off accuracy (e.g. with piecewise trilinear approximation) 
is remarkably small. Viz., with bounded mixed derivatives (at least in 

the weak sense) the usual accuracy of 0{h?) reduces to only 0{h?\ loghp). 

Here we should notice that the smoothness requirement is essential, and that, 
with sufficient smoothness, classical higher order methods may yield even more 
efficiency. As higher order methods can also be used in combination with sparse 
grids [4], both regular and sparse grids may have their own areas of application. 
However, it is clear that proper grid-alignment plays a more important role for 
sparse grids. Therefore, it is useful to see what grids should be used in practice 
under what circumstances. 

Considering the smoothness conditions required for the different approxima- 
tions, we see that the usual, regular approximations require u € C^'(I7), i.e., 
all derivatives up to some constant k should be bounded, whereas the error for 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 402—413, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



Experience with the Solution of a Finite Difference Discretization 



403 



sparse grids is bounded mainly by the mixed derivatives. This implies that the 
error estimates in the former case are essentially direction-independent, whereas 
the error for the sparse grid case is dependent on the grid orientation. This may 
show the area of application of sparse grids: the cases where significant features 
of the solution can be captured by grid positioning. 

We do not want to go into detailed arguments on grid selection. However, we 
want to say that the study of sparse grids has led to new insights in the proper 
application of semi-refinement, hierarchical representation of functions, and the 
use of partially ordered sets of spaces for mesh-adaptive approximation. 

This paper concerns the solution of linear systems as they arise in the finite 
difference approximation of PDEs in 3D. The FD approach to the solution of 
PDFs on sparse grids was initiated by Griebel in [7] and worked out in more 
detail in [13]. More results are found in [10], where we described how the finite 
difference discretization is constructed and how the discrete functions can be 
represented on a nodal and on a hierarchical basis. Other relevant papers on the 
solution of 3D discrete systems on sparse grids are [6,11]. 

The emphasis of this note is on the experience with several solution algo- 
rithms for the finite difference discretization on sparse grids. The algorithms 
are based on a basic iterative solver (BiCGStab [1]) and Nested Iteration. The 
work is inspired by [12], where hierarchical basis preconditioners in three dimen- 
sions are described in a finite element context. The difference is that in [12] a 
classical sequence of meshes is used, constructed from tetrahedral elements and 
quasi-uniform refinement. It has been shown that, in that case, the condition of 
the matrix based on the hierarchical representation, preconditioned by a coarse 
grid operator is 0{h~^\logh\), where h is the mesh size. By diagonal scaling 
by levels, the condition number could be reduced to 0{h~^). Similarly, in the 
present paper, we observe also that the hierarchical representation gives a better 
convergence rate than the usual nodal representation. 

2 Adaptive Function Approximation 

For an arbitrary k = (fci, ^2, fca) G INq, we define a dyadic grid over 17 C 
by _ 

17+ = {xkj 1 xkj = j • hk = n 17, 

and we consider tensor-type basis functions (/3kj(x) = 11^=1 —ji), where 

f{x) = max(0, 1 — jccj) is the usual hat function. Given a continuous function 
u G (7(17), we can approximate it by Un G Vh = Span{</3nj} by interpolation on 
17+ . Obviously, the function Un on 17n is given by 

Un = OnJ^^nJ ■ (1) 

j 

We can make an approximation (1) for all grids 17+ with n > 0. For large enough 
n, the approximation can be arbitrarily accurate, but the number of degrees of 
freedom increases geometrically with jn| = ni + n 2 + n^- Therefore, in practice 
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we select a ‘smallest’ n such that an accuracy criterion is satisfied. Notice that 
keeping the representations in all coarser (all Vk, 0 < k < n) does not take 
essentially more coefficients than the representation on the finest grid (i.e., in Vn) 
alone. 

In order to obtain an efficient approximation, we can distinguish different 
areas in the domain 17, in each of which we make the finest approximation of u 
in different Vn- We make full and efficient use of the system {Vn \ n e INq}, by 
in principle approximating a given function u G C{fl) in all {Vh | n G INq}, but 
using in practice only those coefficients that contribute to a sufficiently accurate 
representation. This implies that in practice the function u is represented in a 
particular only on part of the domain 17. To introduce a (minimal) structure 
in the family of approximating basis functions {v^nj}, we introduce the follow- 
ing condition H. {The H condition:) If a basis function <p„j(x) is used in the 
representation (I), then all corresponding coarser basis functions (i.e., functions 
(^k,i for which supp(((5k,i) 3 supp(v?n,j)) are also used for the representation. 

E- and H-Representation. We call the representation of the approximation 
of a function u G C(l7) by a collection of such (partial) approximations (1) in 
the family of spaces {V„}, the nodal representation, or the E-representation of 
the approximation. This E-representation requires the coefficients Onj = u(xnj) 
corresponding with grid-points Xnj, to be equal on the different grids 17+ at 
coinciding grid-points x„ j. Thus, because points from coarser grids coincide 
with those from finer ones, a certain consistency is required (and a redundancy 
exists) in the E-representation of an approximation. 

Another way of representing approximations on the family of grids { 17+ } is 
by partitioning the approximation over the different grids. Then, instead of (1) 
the approximation reads 



In this case, of course, the set of coefficients {onj} always determines a unique 
function Uh- However, for a given function Uh, now the coefficients {onj} are 
not uniquely determined because the are linearly dependent. One way to 

select a special unique representation is by choosing the coefficients On,j such 
that a„j 0 only for those (n, j) for which |||j||| = ji • j 2 • J3 is odd^. This implies 
that a„,j = 0 except for a pair (n,j) for which 17+ is the coarsest grid which 
contains the nodal point x„ j. This representation 



we call the H-representation because it represents the approximation in the hi- 
erarchical basis 





(2) 



(nJ)Jjlll odd 




(3) 



^ More precisely, with “|||j||| is odd” we mean: for all i = 1,2,3, either ji is an odd 
integer, or ki — 0 (i.e., ji lives on the coarsest grid in the i-direction) . 
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and the part of Uh in 

Wn = Span{v3„j I j e llljlll odd,x„j € 

is the hierarchical contribution from the grid to the approximation. We notice 
that 

3 

Fn = ^ Fn-e, = ^ Kn , 

j—1 0<m<n 

and the sparse grid space is defined by 

Vl= Y. 

0<|m|<L 

corresponding to a sparse grid = Uo<|m|<L ■^m- Interpolating the function u 
at the nodal points x„ j, the hierarchical coefficients a„ j in 



u(x„j) 



(nJ). llljlll odd 



are determined by (cf. [9]) 



3 

j ~ n 




W(jhn) , 



where ^ denotes the difference stencil for the mesh-size h^ in 

the *-th coordinate direction. Notice that this expression is well-defined for each 
odd j because Condition H requires that all h^-neighbors are nodal points in the 
approximation . 

For piecewise multilinear functions, it is often described [5,6,7] how a pyramid 
algorithm can be used to convert au E-represeutatiou to a H-represeutatiou, aud 
vice versa. Such a conversion can be executed in 0{N) operations, where N is 
the total number of coefficients. 



The Data Structure. The data structure to implement all the above possi- 
bilities of an adaptive (sparse) grid representation can be efficient and relatively 
simple. For the d-dimensional case (d = 1, 2, 3), we use the data structure BA- 
SISS [8] that takes the ‘patch’ P„j as an elementary entity. This Pnj takes all 
information related to a right-open left-closed cell 

3 

fc=i 

This implies that there exist as many patches in the data structure as there are 
points used in the description of the approximation. The patches are related to 
each other by means of pointers in an intertwined tree structure, where each 
patch has at most 15 pointers to related patches (3 fathers, 6 neighbors and 6 
kids). The data structure is symmetric with respect to any of the coordinate 
directions. 
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Fig. 1. Regular sparse grid f2g for f2 = (0, 1)^ (left) and an adaptive sparse grid 
(ASG) (right) 

3 Difference Operators for ASG Functions 

Although finite element discretization of a PDE on a sparse grid is feasible 
for a constant coefficient problem in two dimensions, finite elements for more- 
dimensional problems and variable coefficients give problems. The difficulty 
arises because — with the hierarchical basis (3) for test and trial space — the 
computational complexity of the evaluation of the discrete operator becomes too 
large. This is caused by the fact that the intersection of the supports of an arbi- 
trary trial and test function is much smaller than the supports of these functions 
themselves. This has as a consequence that the advantage of sparse grids is lost 
if the FEM discrete operator is evaluated. 

The alternative, as it was already suggested in [7,13], is the use of a finite 
difference discretization. Therefore, in order to solve PDFs on sparse grids, we 
should be able to apply (approximate) differentiation to discrete representations 
of approximations as described in [10]. The application of linear difference oper- 
ators of the form 

LhUh = X] ^ + C(x)uh(x) (4) 

comes down to the construction of linear combinations, the pointwise multi- 
plication, and the differentiation of functions (2). In both representations the 
construction of a linear combination over the real numbers is directly computed 
by application of the linear combination to the coefficients. Pointwise multipli- 
cation is only possible in the E-representation, in which the function values at 
grid-points are directly available. For a description of the evaluation of first and 
second order derivatives we again refer to [10]. 



First and Second Order Interpolation. Because we use piecewise tri-linear 
basis functions <pnj(x) on the grid truncating at a particular level corre- 
sponds with tri-linear interpolation between the nodal points included. In this 
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way, piecewise tri-linear interpolation is natural in the finite hierarchical repre- 
sentation. 

For C^’^’^(f7)-functions, the behavior of the coefficients Onj is rather pre- 
dictable for higher levels of approximation because Lemma [9, Lemma 3.2]. 
gives a precise relation with the second order cross derivatives, or in lower di- 
mensional manifolds (at the coarsest level, at the boundaries, or in mixed H-E- 
representations over the different coordinate directions) with the second order 
derivatives. This allows for an efficient quadratic interpolation procedure when a 
finite hierarchical representation of a discrete function is available. To interpolate 
the function 

ui{^)=J2 a„j(/3„j(x). (5) 

|n|<^ j.iyill odd 

with second order accuracy to a function the coefficients {onj | |n| = 

£ + 1} can be derived from the coefficients {a„,j | jnj = £} by taking the new 
coefficients a^.k = Onj/4, where jmj = |n|-|-l and m and j satisfy jx^.k— Xn,j| < 
2~^. This corresponds with the extrapolation assumption that the second order 
derivative is slowly varying (constant) over the smallest covering cell f2„j. In 
order to maintain symmetry over the coordinate directions, in the case of a non- 
unique smallest covering cell one may take the mean value of the coefficients of 
all (at most d — 1) smallest covering cells. In this way, we introduce the second 
order interpolation operator f, defined by 

<+' = P,+i,,4, (6) 

where both and are described by (5). First order interpolation is simply 
achieved by setting Om.k = 0 for jmj = jnj -|- 1. 

4 Solution of the Finite Difference Discretization for the 
Laplacian 

In the remaining part of this paper, as an example of (4), we solve the discretized 
operator equation as it was described in detail in [10]. For simplicity, we restrict 
ourselves to the model problem of Poisson’s equation with homogeneous Dirichlet 
boundary conditions. 



— Au = f in 12, (7) 

u\sQ = 0, 

on the cube 17 = (0, 1)^ and a regular sparse grid. 



Iteration Based on a Galerkin Relation. In [10], an analysis of the dis- 
cretization was made and multilevel-type algorithms, based on the Galerkin 
structure of the equations were proposed. The coarse grid operators involved 
were no longer finite difference operators. In an obvious way, the Galerkin rela- 
tions lead to iterative (defect correction) solution algorithms that are applied in 
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a multilevel setting. However, no spectral equivalence could be established, and 
the convergence of the iterative schemes appears to depend on the maximum 
discretization level used, so that the convergence rate slows down on hner grids. 
The algorithm is briefly characterized in Figure 2 (for details see [10]). Applied 
to the 3D-problem (7) with the right-hand side /(x) = — sinTrxi -I- 

8 rii=i sin 87ra:i) and starting from the zero function u'^'^ = 0, we obtain the 
convergence behavior shown in Figure 3. We see that we get better convergence 
if we include also lower levels (right). In both cases, however, the speed of conver- 
gence slows down with growing levels. Approximately, the reduction factor gets 
worse with the square of the highest level. The slow convergence motivates 
us to see if better convergence could be obtained by cascadic iteration. 



for £ from Lo to L 
do for i = 1 to u 

do for all |n| = £ 

do Uh ■— Uh + Ph,n L~^Rn,L {ft ~ L^Uh) onddo (G) 

enddo 
enddo 



Fig. 2. The Galerkin algorithm (G) 

cycles with L = 6,.. .,9, 1 = L cycles with L =6,. ..,9, ! = 3 





Fig. 3. Left: Gonvergence of Algorithm (G) for the levels L = Lq = 6, . . . , 9. 
Right: Gonvergence of Algorithm (G) for the levels L = 6,...,9, Lq = 3 with 
v=l 



Cascadic Iteration. By construction, the sparse grids and the sparse grid 
spaces are provided with a multilevel structure, i.e., C fif+i ^ V«+i- 

Moreover, in [10], we could prove a Galerkin relation 

= Ri,e+i Pe+i,t 



Experience with the Solution of a Finite Difference Discretization 409 

for the discrete Laplace operator Lf^ in hierarchical representation. Here, 
denotes the natural hierarchical restriction and Pi+ij is the first order interpo- 
lation. This will be used in a cascadic iteration. 

In [2,3], Bornemann and Deufihard proposed the cascadic multigrid method. 
In this method, a solution is computed by nested iteration on a sequence of refin- 
ing grids, without coarse grid corrections applied on the finer grids. In cascadic 
MG, more basic iterations are used on the coarser than on the finer levels. It 
has been proved [3,14] that cascadic MG applied to a FEM discretization us- 
ing Pl-conforming elements for the second order 3D problem is accurate with 
an optimal computational complexity for all conventional iterative methods, like 
Jacobi or Gauss-Seidel iteration, as well as for the conjugate gradient method as 
a smoother. However, in the 2D case the cascadic MGM gives accurate solution 
with optimal complexity for the GG method, but only nearly optimal complexity 
for the other conventional smoothers. 

In [15], it is shown that that this is also true for other conforming or non- 
conforming elements, provided that m; > with mi the number of it- 

erations on level I and some constant (3 depending on the relaxation method. 




Fig. 4. Gascadic iteration: the problem is approximately solved on a coarser 
(lower) grid before interpolation to a finer (higher) grid is made. The cycle over 
all levels is repeated in an outer defect correction (iterative refinement) process. 
The levels used are the union of the grids Sin, with jnj = fc, fc = l,2,...,10. The 
number of points at each level is given in Table 1 

For iteration, we use a cascadic application of the BiGGStab algorithm [1] 
for the solution of Lj^Uh = f^- The algorithm is shown in Figure 5. In the 
algorithm denotes the natural hierarchical restriction and is the first 

order prolongation Pi+i,t or the second order prolongation (6). Gomputations 
are made with this algorithm on meshes up to 10 levels. The corresponding 
number of gridpoints is given in Table 1. 



Table 1. The number of points on the different levels 

levels k: 01234 5 6 7 8 9 10 

points #: 8 44 158 473 1286 3302 8170 19699 46594 108568 249910 



The working horse of the solution algorithm is BiGGStab iteration. Because 
of the non-sparse structure of the matrix representation of the sparse grid dis- 
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until a convergence criterion is satisfied 
do fh~fh-LhUh 

Ch ■— 0 

for £ from Lq to L 

do int i=0, ic=0; 

until i > imax do 

real n,p,f3,ui, a = 0.0, po = 1-0, cJo = 1-0; 

;= Re,Lfh ~ Lich 
n = (rh,rh) 

if j=0 then no = n endif 
if n < e then break endif 

Vh = 0 

Ph = Vh 
fh = rh 

until i > imax do 
P = {fh,rh) 

if IpI < e then break endif 

P = (p/po)(a/wo) 

Ph=rh + I3{ph - u;vh) 

Vh = Liph 

d = {fh, Vh) 

if |d| < e then d = 1.0 endif 

a = p/d 
Vh =Vh - avh 
ih — ^hVh 
d ~ {th, th) 

if |d| < e then break endif 

w = {th,rh)/d 
Ch = Ch + aph + uJVh 
rh = rh- ivth 
po = P 

iUo = UJ 

n = (rh,rh) 

if |w| < e then break endif 

if ic > ic.max then ic=0; break; endif 
i=i+l 

enddo 

enddo 

Ch '■= Pe+i,e Ch 

enddo 

Vh Vh Ch 

enddo 



Fig. 5. The cascadic iteration algorithm with BiCGStab 
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Crete operators, we are only interested in matrix-free methods. This restricts the 
choice of the applicable preconditioning methods. In fact, for preconditioning we 
restrict ourselves to diagonal scaling and transformation between E- and H- rep- 
resentation. We exploit the available hierarchical structure of the approximate 
solution by the computation of a good initial approximation on a given level by 
interpolation of a sufficiently accurate solution that is computed on a coarser 
level. Thus, starting from a coarsest grid, we obtain the cascadic algorithm. 

First, the algorithm was applied both to the E-representation and to the 
H-representation of the solution, and it appeared that the solution of the H- 
representation is much faster. This is in agreement with the findings of Ong [12] 
for the solution of a FEM discretization with the tetrahedral element and quasi- 
uniform refinement, as discussed in the introduction. As a consequence we further 
only considered iteration with the H-representation. 

By itself the BiCGStab is not a very efficient solver, but combined with 
cascadic switching between the levels we obtain an algorithm that solves the 
equation up to truncation error accuracy in only a few (outer) cycles. This is 
shown in the Figures 6 and 7. In the Fig. 6, we see the difference between using a 
large number of (inner) BiCGStab iterations vs using a small number. In Fig. 7, 
on level 10, we see the difference between the use of the first order prolongation 
(left) or the second order formula (6) (right). We clearly see that second 
order interpolation gives a much better convergence, so that truncation error 
accuracy is obtained in a small number of (4) outer iteration cycles. 



Legend to Figures 6 and 7. Top figures: the logarithm of the two- norm of the 
measured residual at different levels and in the inner loop, against the number 
of inner iterations. Bottom figures: logarithm of the residual and the global 
discretization error of the solution of the target equation against the number 
of elementary operations (flops). The constant lines indicate the approximation 
error and the local truncation error. 

5 Conclusion 

Because the evaluation of finite element stiffness matrices for variable coefficient 
equations on sparse grids in three dimensions still yields difficulties, finite dif- 
ferences are an interesting alternative instead. In this paper, we show how a 
cascadic multigrid application of BiGGStab yields an efficient solution method 
for the resulting discrete equations. 

The method applies the BiGGStab-iteration to the H-representation of the 
discrete solution, it uses second order interpolation between the different levels 
of discretization and it applies global defect correction (iterative refinement) as 
an outer iteration cycle. Results for this solution method are presented which 
show that 3 or 4 iteration cycles may be sufficient to solve the discrete equations 
up to local truncation error accuracy. 
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Fig. 6. The advantage of spreading inner iterations over more outer iterations. 
Left: a single outer iteration with 36 inner iterations at each level. Right: 6 outer 
iterations with 6 inner iterations each 




Fig. 7. Convergence at level k = 10. Left: first order interpolation between the 
levels. Right: second order interpolation 
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Abstract. The problem of an energy dissipation optimization in a con- 
ductive electromagnetic media is considered. The domain is known a 
priori and is fixed throughout the optimization process. We apply a 
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1 Introduction 

Computation of electromagnetic fields in various settings, analysis and different 
approaches for the spatial discretization of the Maxwell equations have been a 
subject of intense research in the last decade, see, e.g., [2,7,10]. In this paper we 
consider problems concerning topology optimization in electromagnetic media. 
For a general overview on the field of structural optimization and topology de- 
sign, we refer to [3]. We are looking for an optimal distribution of conductivity 
in a fixed geometrical configuration. 

Let f2 C be a domain occupied by a conductor with a conductivity cr > 0. 
The rest of the space is vacuum. To simplify the presentation, we consider the 
stationary case, i.e., constant currents are available in the conductor (div J = 0). 
In this case the Maxwell equations read: 

curlE = — 9tB, curlH = J, divD = p, divB = 0, (1) 

* This work was supported in part by the Alexander von Humboldt Foundation. The 
second author has also been supported by the Bulgarian Ministry for Education, 
Science, and Technology under Grant MM-98 t^801. 
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supplemented by the following material laws: 

D = eE, B = AiH, J = ctE. (2) 

Here, the fundamental electromagnetic quantities are the electric field E, the 
magnetic induction B, the magnetic fieldH, the electric induction D, the electric 
current density J, and the space charge density p. We consider only linear and 
isotropic materials, so that the electric permeability e, the magnetic permeability 
p, and the electric conductivity a are supposed bounded scalar functions of the 
spatial variable x with e > £o > 0, p > po > Q, and cr > 0. Steep jumps of these 
coefficients may occur at material interfaces. One can introduce a scalar electric 
potential p and a magnetic vector potential A, so that 

E = —grad ip — dtA and B = curl A. (3) 

To specify A, which is not uniquely defined, we use the Coulomb gauge, namely, 
div A = 0. From (2) and (3) one gets J = tr E = — crgrad ip — a dtA, which yields 

div J = div (curl H) = 0 = —div (cr grad ip) — div {a dtA). (4) 



Suppose now that tr is piecewise constant, i.e., independent of the spatial variable 
X. Then div A = 0 results in div (cr dtA) = 0. From (4) we get the following 
coupled system of equations for ip and A: 



div(cr grad (/?) = 0 in C, n • cr grad ip = 



fy on fy, C di? 
0 otherwise 



• dtA + curl {p ^ curlA) = 



— crgrad ip 

0 i 



in Q 

R^\n 



( 5 ) 



(6) 



Here, the unit normal vector is denoted by n. For the given electric current 
densities {fy} on the boundary C dl7 we impose the compatibility condition 
The energy dissipation given by the Joule-Lenz law reads as follows: 



/('F,cr) 



JEdx 



J • grad ip dx 



div(i^ J) c?a:. (7) 



JQ JQ JQ 

Using the Gauss-Ostrogradski formula and the Neumann boundary conditions 
from (5) we get the following expression: 



cr) 



n ■ 3 ip ds = 



Ion 



E 



Ijj if ds. 



' Tu 



( 8 ) 



The remainder of this paper is organized as follows. In Section 2 we introduce 
the primaUdual formulation of our nonlinear nonconvex programming problem. 
Slack variables are added directly to the optimization problem. In Section 3 
we discuss the steplength strategy and give the interior-point algorithm. In the 
last section, we include some numerical experiments concerning the conductivity 
distribution for a two-dimensional isotropic system. 
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2 Primal— Dual Approach 

In this section, we formulate the nonlinear nonconvex optimization problem for 
a minimization of the energy dissipation given by (8). 



Here, CTmin and CTmax are a priori given positive limits for the conductivity 
and C is a fixed given value. Note that we formulate a constrained optimization 
problem, where the differential equation for ip (5) is part of the constraints. This 
is in contrast to many standard optimization approaches, which would consider 
ip as & function of the independent variable a via the differential equation. How- 
ever, this simultaneous optimization approach reduces the overall computational 
complexity of the resulting optimization algorithm. 

We apply the primal-dual interior-point method, originally proposed for lin- 
ear programs by [8]. This method has been recently extended to nonlinear pro- 
gramming in [1] and started to prove its impressive computational performance 
for nonlinear programming, see, e.g., [5,6,12]. We deal with the corresponding 
inequality constraints introducing nonnegative slack variables. This variant of 
the primal-dual approach has been used, e.g., in [1,9]. After a finite element 
discretization of the domain we get the following finite dimensional nonlinear 
programming problem: 



where A{cr) is the finite element stiffness matrix, b is the discrete load vec- 
tor and g{cr) is a discrete approximation of J^crdx. Here, e S TZ^ , e = 
(ei, . . . ,6iv)^, Ci = 1, 1 < i < iV, and cr,s,t G TZ^ , where N is the number of 
finite elements. Note that the lower bound (Tmin plays a crucial role keeping the 
ellipticity of the discrete problem. 

The Lagrangian function associated with problem (11)-(12) is: 



Here, A, ry, z > 0, w > 0 and a >0, /3 > 0 are the Lagrange multipliers for 
the equality and inequality constraints in (12), respectively. Our purpose is to 




( 9 ) 



subject to the following constraints: 



(p satisfies (5), 

f a dx = C (mass constraint) , 

n 

Cmin < cr < (Tmax (conductivity box constraint). 



max 



( 10 ) 



mm/(V,<r), 



( 11 ) 



subject to A{cr) — b = 0, ermine — ( 
g{cr) - C = 0, (T - CTmax' 



min' 



— cr + s = 0, 

nax^ t = 0, 




-C(v3,o-, A,7y,z,w,s,t,o:,,3) := f{ip,(r) + A^(A(cr)(p - b) + g {g{cr) - C) 

+ {amine - cr -I- s) 4- w^(cr - amaxe + t) 



min' 




(13) 
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find an isolated (locally unique) local minimum of the problem (11)-(12) under 
the assumption that at least one such point exists. We suppose that the standard 
conditions for the application of Newton’s method, see, e.g., [4], are satisfied. 
Denote by # (tp, cr. A, 77, z, w, s, t) the vector of the unknown variables. The 
complementarity conditions Z s = 0 and W t = 0 are replaced by the perturbed 
complementarity conditions Zs = pe and Wt = pe. At each iteration, the 
positive parameter p is decreased by a certain amount. 

The necessary first-order Karush-Kuhn-Tucker (KKT) optimality condi- 
tions lead to the following nonlinear equation: 



Fpm 



/V^£\ 

Vcr/: 

V^£ 

Vw£ 

\VtCj 



/ W^f + A{(T)^X 

da-{X^ A{(r)ip) + rjWg{(T) 
A(cr)cp — b 
5(cr) - C 

(T S 

Z s — pe 
^ Wt — pe 



\ 

z + w 



= 0 , 



/ 



(14) 



where Vg£ = Z s— pe, Vt£ = W t — pe. The search direction is given by := 
(Atp, Act, AA, A77, Az, Aw, As, At). The update # <— ^ -I- A^ is determined 
by the increment A# computed by using the Newton method for the following 
p— dependent system of equations. 



f;(#)A#=-Fp(^), ( 15) 

where (15) is often referred to as the primal-dual system and solved at each 
iteration with a decreasing parameter p. More precisely, (15) is equivalent to: 



/ 0 


Ctpcr 


F-ipX 
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0 0 






/ A<p \ 




/Vcp£\ 


£(t<p 


£(T(T 


‘^aX 


£(T ri 


-I I 0 


0 




Act 




Vcr£ 


^X(p 


^A,t 
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0 


0 


0 0 


0 




AA 




Va£ 


0 


FrjCr 
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0 
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0 0 


0 




Ap 




Vp£ 


0 


-I 


0 


0 


0 


0 I 


0 




Az 




v,£ 


0 


I 


0 


0 


0 


0 0 


I 




Aw 




Vw£ 


0 


0 


0 


0 


s 


0 z 


0 




As 




Vg£ 


V 0 


0 


0 


0 


0 


TOW/ 




At ^ 




/ 



where I stands for the identity matrix, S = diag(si), Z = diag(zi), T = diag(£), 
and W = diag(wi) are diagonal matrices. Note that £_\(^ = A{cr) is the stiffness 
matrix of the electric potential equation, Ccrcr is a diagonal matrix, and Crjcr = 

g{(r) is just one row vector. 

The primal-dual matrix £p(^) in (15) is sparse, nonsymmetric, indefinite, 
and usually well-conditioned. Our approach is to transform F^(^) to a smaller 
(so called condensed) matrix, which is inherently ill-conditioned, but the ill- 
conditioning should not necessarily be avoided and has no negative consequences. 
For detailed discussion, see, e.g., [12]. We eliminate the increments for s and t 
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from the 5th and 6th rows of (16), namely, As = Act— V z£, At = —Acr—V-wC. 
From the last two rows of (16) we obtain the increments for z and w: 

Az = 5'-i(-Vs£ - Z (Act - Vz/:)) (17) 

Aw = T~\-VtC - W (-Act - V^£)). 

Substituting (17) in the second row of (16), we get the following linear system: 



! 0 £yi(T £^x 0 N 




f Acp\ 




fV^£\ 


£(JLp £<T(T £(j\ ^CTri 




Act 




S7a£ 


0 0 




AA 




V^£ 


o 

O 

o 




\Ap ) 







where £crcr ~ £(tct + >5' ^ Z + T and the modified entry for the right-hand 

side is 

Vcr£ = Vcr£ + S~^(Vs£ -zv^£) - T~^(Vt£ - WV^£). 

Transforming iterations, proposed in [11], for the null space decomposition of 
the condensed matrix, are applied to compute the search direction, see, [9]. 



3 Interior— Point Method 

We apply the line-search version of the Newton method. After computation of 
the search direction A#, a common steplength a (a > 0) is employed to update 
the solution # <— ^ + aA^. In all Newton-type methods, a = 1 is almost always 
the ’’ideal” value. The method for choosing a at each iteration becomes more 
complex, as it is well known that for general nonlinear problems with a poor 
initial estimate, Newton’s method may diverge. Complete convergence analysis 
of the Newton interior-point method for nonlinear programming is given by [1] 
provided the Jacobian Fp(^) of the system (14) remains nonsingular. 

A standard approach for choosing the steplength a is to define a suitable 
merit function, that measures the progress towards the solution. The squared 
? 2 — norm of the residual as a merit function was introduced in [1] as 

M{^) = \\Fm\l ( 19 ) 

where F{^) := Fp{^) + pe, see (14), and e = (0, . . . , 0, 1, . . . , 1) is a vector 
with 2N ones. We accept the following notations: Mk = Mk{0) = M{^k) 
and Mk{a) = M{^k + ctA^k), where is the computed solution at a given 
iteration. 

To specify the selection of a, we apply the algorithm proposed by [1]. 
Let ^0 = (tPoi -^ 0 ) zq, Wo, So, to) be a given starting point with 

(zo. Wo, So, to) > 0. Let 



r = min(ZoSo, Woto)/[(z]^So + w[(’to)/(2 A^)]. 
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We denote by 

^(a) := (<p(a), cr(a), A(a), 77 (a), z(a), w(a), s(a), t(a)) = # + a A#. 
For a given iteration k, we define 



qk{a) = min(Z(a)s(a), VF(a)t(a)) - (z(a)^s(a) + w(a)’^t(a))/(2 iV), 

where 7 ^ € (0, 1) is a constant. The steplength ak is determined as 



Note that the function < 7 fc(a) is piecewise quadratic and, hence, ak is either one 
or the smallest positive root of gfc(a) in (0, 1]. 

We describe now the primal-dual Newton interior-point algorithm. 

Interior point algorithm: 

1. Choose #0 = (V’oi'^o, Ao,?7o,zo,wo,So,to) such that (zq, Wq, Sq, to) > 0 and 
[3 € (0,1/2]. Set fc = 0, 7fe_i = 1, and compute Mq = M(^o)- For k = 
0, 1, 2, . . . , do the following steps: 

2. Test for convergence: if Mk < Cexit, stop. 

3. Choose ffc G (0)1); for # = ^k, compute the perturbed Newton direction 
A#fc from (15) with a perturbation parameter 



4. Steplength selection. 

(4a) Choose 1/2 < 7 fc < 7 ^- 1 ; compute ak from (20). 

(4b) Let ak = afe/(2”), where n > 0 is the smallest integer such that 



5. Let ^fc+i = + ak A#fc and fc ^ fc + 1. Go to 2. 

It was shown in [1] that for the proposed choice of in (21), the search direc- 
tion A#fe , generated by the interior-point algorithm, gives descent for the merit 
function M(#fc), i.e., VM^A^k < 0, where VMk is the derivative of Mk{a) at 
a = 0. 

4 Numerical Experiments 

In this section, we give some details concerning our computations. We solve the 
optimization problem (11)-(12) with an objective function defined in (8). The 
first equality constraint is related to solving elliptic differential equation for the 
electric potential cp, see (5). We allow here some modification in the conductivity, 
namely, we consider 



ak = max {a : qk{a') > 0, for all a' < a}. 



aG(0,l] 



(20) 



Pk = Sfc -h Wfc tfe)/(2 N). 



(21) 



Mk{ak) < Mfc(O) + afc /3 VMjA#fc. 




(22) 
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where 



h{a) = 



0.01 






(23) 



is treated as a conductivity. Neumann boundary conditions were imposed, as- 
suming that the compatibility condition from Section 1 is satisfied. The compu- 
tations have been carried through a rectangular domain f2 decomposed into N 
uniform quadrilateral finite elements. We suppose that the domain is an isotropic 
conductor. The conductivity is computed at the center points of the finite ele- 
ments and the electric potential is approximated at the midpoints of the edges. 
Due to the definition (23), the diagonal matrix Ccrcr does not vanish. 

Our primal-dual code was written in C++ using double precision binary 
arithmetic. All numerical tests were run on Alpha PC164LX machine. We choose 
lower and upper limits for the conductivity cr min = 0.01 and (Tmax = 1, re- 
spectively. In all runs, an initial homogeneous distribution was proposed with 
(T = 0.45. The constant C in (11) is computed in accordance with this initializa- 
tion. The following parameters for the interior-point algorithm in Section 3 are 
used: = min(0.2, 100 (z^ s*, -|- t^,)), /3 = 0.0001, and Cexit = 10“®. 

The most expensive (in terms of CPU-time) part of the algorithm during a 
given iteration is to solve the condensed primal-dual system finding the incre- 
ments. Two transforming iterations have been used with a zero initial guess. The 
preconditioned conjugate gradient (PCG) method is applied with the symmetric 
successive overrelaxation (SSOR) iteration as a preconditioner for the stiffness 
matrix. We choose a relaxation parameter a; = 1.5 and a stopping criterion for 
both iterative procedures r^A(/i(cr))r < 10“^°, where r is the current residual. 

The results from our numerical experiments are reported in Table 1 for var- 
ious number of contacts NC and various number of finite elements N. The 
dimension of the stiffness matrix is denoted by NP. We report as well the global 
number of iterations in the main optimization loop denoted by iter, the pertur- 
bation parameter p and the merit function M(^) at the last iteration. 



Table 1. Results from applications of the interior-point algorithm 



NC 


N 


NP 


iter 


P 


M(^) 


2 


30 


71 


20 


1.13e-4 


5.11e-7 


2 


40 


93 


14 


2.17e-5 


3.42e-8 


2 


80 


178 


18 


7.03e-5 


5.76e-8 


2 


80 


178 


22 


5.08e-4 


2.05e-7 


2 


120 


262 


34 


3.93e-5 


1.27e-8 


3 


30 


71 


25 


6.03e-4 


4.18e-7 


3 


64 


144 


41 


5.17e-5 


8.03e-8 


4 


96 


212 


45 


3.12e-4 


4.32e-7 


5 


180 


388 


42 


1.18e-4 


2.84e-7 
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Fig. 1. Conductivity distribution for a mesh 30 x 40 with 5 contacts 



Figure 1 shows the conductivity distribution for a mesh 30 x 40 with five 
contacts. The black color indicates elements where the conductivity is very close 
to (Tinax and the white color indicates those elements with a conductivity close 
to CTuiin- 
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Abstract. The objective of this paper is to calculate residual stress 
in drawn wire taking into account induced temperature due to plastic 
dissipation energy. Finite element analysis (FEA) for the simulation of 
wire drawing is applied. The general purpose FEA code MARC, is used to 
analyse thermo-coupled wire drawing processes. The necessary condition 
for determination of range of steady state flow was proposed. 



1 Introduction 

Wire drawing forming involves complicated deformation process with material, 
geometrical and contact nonlinearities. One of the vital characteristic of drawn 
wire is the distribution of the residual stress in it. For obtaining the optimised de- 
sign of such forming processes, investigations into details of deformation, namely, 
residual stress and deformation state, microlevel changes and cracks are ex- 
tremely important. 

The numerical simulation of wire drawing by means of finite element has 
been dealt by many authors Davas W. and Fischer F.D.; Boris S.; Doege E. and 
KroeffA. [9,8,10,1]. 

In [8] a study-state cold wire drawing model based on Lagrangian incremental 
elastic-plastic formulation is considered. The general purpose finite element pro- 
gram ABAQUS has been used to solve 2D wire drawing finite element model. 
Two different optimisation problems associated with optimal die design were 
considered. Minimisation of the total energy in the process and maximisation of 
the reduction area. 

There are metal forming processes in which thermo-mechanical coupling 
investigations are necessary. For example the deformation and friction dur- 
ing aluminium extrusion cause considerable temperature increases (up to more 
100°C) [2]. 

During the wire drawing process, large nonhomogeneities in deformation and 
consequently in heat generation, usually occur. Moreover, especially if the dies 
are at a considerably lower temperature than the workpiece, the heat losses 
by conduction to the dies and by radiation and convection to the environment 
contribute to the existence of severe temperature gradients. The friction forces 
between workpiece and dies are heat source, which is very important when a mild 
materials are considered. Thus including temperature effects in the analysis of 
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wire drawing problems is very important. Furthermore, at elevated temperatures 
plastic deformation can induce phase transformations and alterations in grain 
structures which in turn will modify the flow resistance of the material as well as 
other mechanical properties. It is necessary to include these metallurgical effects 
in thermo-mechanical coupling models. 

The influence of friction in wire drawing is very important. A new upsetting 
sliding test is used in [4] for the determination of the friction coefficient by 
simulating the wire drawing contact conditions. The test is performed on the 
real workpiece directly from the drawing plant, this result is directly usable for 
the finite element simulation of the wire drawing process. 

The design, control and optimisation of wire-drawing metal forming pro- 
cesses by means of classical trial — errors procedures become increasingly heavy 
in terms of time and cost in a competitive environment. Simultaneously, the 
improvement of the final product requires the microstructure, constitutive be- 
haviour and deformability to be known a priori regarding a targeted application. 
During the last years, numerical simulations have become a very efficient tool 
to reach these goals. It is well known that the residual stresses are induced by 
fabrication processes and that those stresses will superimpose on to the ser- 
vice stresses especially in surface layers where, in most cases, fatigue or stress 
corrosion cracks initiate. 

The aim of finite element simulation of wire-drawing process is prediction of 
local value of strain rates, strains, stresses and temperature during deformation 
with a view to obtaining some insight into the effect of the process on the final 
mechanical properties: texture, anisotropy, residual stress and die wear. How- 
ever, reliable predictions from numerical simulations require reliable input data, 
including constitutive laws and friction conditions. 



2 Residual Stresses in Wire-Drawing Process 

Residual stresses are effective static stresses, which are in a state equilibrium, 
without the action of external forces and/or moments. They always occur when- 
ever a macroscopic cross-sectional area of a component or a microscopic area 
of a multi-phase material is partially and plastically deformed by external and 
internal forces. These forces may either be due to thermal loading, processes of 
diffusion or phase transformation in such a way that, incompatibilities of defor- 
mations may caused. [.3] 

Due to the plastic deformation in wire-drawing, most of the mechanical en- 
ergy expended in the deformation process is converted into heat and the remain- 
der is stored in the material. The stored energy is associated with residual stress 
generated in the wire after plastic deformation and unloading as well as with the 
creation of lattice imperfections. It means that the stress in free-force state are 
residual stress in wire. The nature of this residual stress is plastic deformation 
as well as changes on micro-level. 

Metal forming processes (especially wire-drawing and extrusion) commonly 
generate non-homogeneous plastic deformation in the workpiece so that the final 
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product is left in a state of residual stress. In cold working the yield stress of 
work piece is higher than for warm or hot working so that the residual stresses 
produced in cold working are in general higher. 

The residual stresses can have a deleterious or beneficial effect on fatigue 
strength. Hence, investigation of the generation of residual stresses in metal 
forming can be important from the standpoint of either avoiding defects by 
reducing residual stresses or tailoring the die-geometry to produce high beneficial 
stresses. 

The causes of residual stresses can be classified under the following three 
main groups: material, manufacturing and loading and service conditions. 

Manufacture-induced residual stresses can be determined both by calculation, 
as well as by experimental. An experimental determination of residual stress in 
cementite and ferrite phases of high carbon steel was provide by Houtte in [6] . 

Most of previous analyses of metal-forming processes were on the basis of 
rigid- plastic theory and models unfortunately they can not provide information 
concerning residual stresses in wire drawing. [7] 

The residual stresses generated by metal-forming processes occur because 
of variations in the plastic strain distribution which are of the order of elastic 
strains. Small changes in the forming tool configuration can have a dominant in- 
fluence on the residual stresses. This suggests that the die design could be chosen 
to produce beneficial stresses. A basic aspect of the problem, which might become 
significant in view of the sensitivity of stress distribution to small changes are die 
geometry and boundary conditions like a friction in boundary value problems. 
This type of sensitivity analysis could be included in the finite element calcu- 
lations. It would be useful to develop the models and FE codes to incorporate 
more general material properties such as anisotropic hardening and influence on 
texture on material properties. 

In present investigation a FE solution is used to obtained fields of stresses, 
strains, plastic strains and residual stresses. 

3 Finite Element Method Application 

Finite element approach based on displacement method was applied. The gov- 
erning matrix equation for the thermo-mechanical couple problem in the case 
without dynamic effects are as follows: 



where T is temperature, {u} is displacement vector, [K(T, u)] is the stiffness ma- 
trix, [C(T)] is the heat-capacity matrix and [H(T, it)] is the thermal-conductivity 
matrix are all dependent on temperature and in the case of update-Lagrangian 
analisys [A"(T, it)] and [H(T,u)] are dependent upon prior displacement. is 
the internal heat generation due to inelastic deformation, is the heat gen- 
eration due to the friction between workpiece and die. The coupling between 



[K{T,u)]{u} = {F}. 
[CiT)]f+[H{T,u)]T = Q + QP + Qf , 



( 1 ) 

(2) 
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heat transfer problem and the mechanical problem is due to the temperature- 
dependent mechanical properties and the internal heat generated. 



4 Steady State Flow Area 



One of the problem is haw to determine a minimum length of wire piece in which 
after numerical simulation can be reach steady state flow area. The steady state 
area is defined as a set of cross sections of wire, where the local error between 
axial residual stresses satisfied following energy condition: 



or 



hi-gjWh^ 
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£ ^ 



hi-djWL^ 
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( 3 ) 

( 4 ) 



where gi and gj are axial residual stress functions in cross sections i and j in free- 
force state, Nk is a number of gauss integration points(xfc) or nodal points(xfc) 
in a cross section, e is a small constants and i,j = 1,2,3, - ■■ N,i j, N- number 
of cross sections. 

Cross section can be defined as a sequence of finite elements or as a sequence 
of nodes. In proposed definition L 2 norm is used, which is same as in a finite 
element approximation theory. It is assume that the deformed wire body is passed 
through the die and a force- free state have been reached in wire. 



5 Numerical Example 

A thermo-couple wire-drawing problem is considered. A large deformation prob- 
lem incorporating thermo-mechanical coupling is performed. The kinematics of 
deformation is described by update Langrangian approach which is useful in 
the cases in which rotations are large so that nonlinear terms in the curvature 
expressions may not longer be neglected and for calculations which the plastic 
deformations cannot be assumed infinitesimal. The update Lagrange formulation 
takes the reference configuration at t = n-|- 1, Cauchy stress and true strain, are 
used in the constitutive relationship. 

Two examples of thermo-coupled wire drawing problem are considered. The 
difference is only in die model. In first case die is model as rigid body and in 
second case the die is model as deformable one. A four-node bilinear axisymmet- 
ric finite element is used. Temperature dependent material data and geometrical 
data a given in table 1 . 

Finite element simulation of wire drawing was provide and following assump- 
tion were made: 

— the deformation of the work piece was axisymmetric; 
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— the material exhibited elastic-plastic behaviour in both the loading process, 
during which the wire moved through the die, and the unloading process, 
after which the wire was emerged from the die; 

— constant friction coefficient in Coulomb friction law was assumed at the die 
and workpiece interface. - kf = 0.1. 

— material is homogeneous and isotropic with a non-linear hardening; 

— as the temperature changes, thermal stresses are developed due to nonzero 
coefficient of thermal expansion; 

— as temperature changes , the mechanical properties changes (softening), it 
happens because of the temperature-dependent flow stress was assumed; 

— as the geometry changes, the heat transfer boundary value problem changes, 
this includes changes in the contacting interface; 

— as plastic work is performed, internal heat is generated ; 

iQp = Sv^.^^"imdv) 

— as the bodies slide, friction generates heat {Qf = | Fy || Ur | [iVJdS'/) 

Only one wire-drawing pass is simulated with 24% reduction in area that is 
why FE-rezoning procedure was not applied. Fig. 1 and fig. 2 show initial FE 



Table 1. Material and geometrical data for the FE model and material data for 
thermo coupled wire drawing problem for Aluminium 1100 AT 



workpiece - data 


die - data 


FE model data 


initial lenght L = 82mm 
E = lOOOA/mm^ ; v = 0.33 
masss density l.Qglmm^ 

Gy = 8AN jmrr? at 200° 
coeff. of friction kf = 0.1 
workhardening data 
plastic strain - flow stress 
0.00 — 3.400 
0.15 — 5.100 
0.70 — 5.780 
5.00 — 6.000 


outlet angle 7 = 38.66° 
inlet angle a — 9.65° 
approach angle /3 = 9.65° 
reduction in area Ra = 24% 
inlet radius Do = 15mm 
outlet radius Df = 13mm 
approach zone la = 30mm 
bearing zone h = 5mm 
outlet zone lo = 5mm 
E = l.lO^N/mm^ ; i/ = 0.33 
mass density l.Og/mm^ 


wire: 430 EE 
4 node axis. FE 
1033 nodes 
Die : 24 EE 
4 nodes FE 
rigid die 


initial temp. - 427° C , Al. 

= 242A/s°A 

c™ = 2.4255242A/mm^°A 
h = 0.007 N/ smm° K 
Teylor-Quiny coeff. Qt = 0.9 
Gy decrease at rate 

0.007 N/mm^ when the temp, increase 


initial temp. - 20° C 
fed = 19N/s°K 
Cd = 3.77 N/mm'^°K 
hiubr = 35N/ smm° K 





mesh and geometry of die and workpiece in both cases. Fig. 3 and fig. 4 show 
temperature field distribution in die and workpiece in both cases. Fig. 5 and 
fig. 6 show residual stresses distribution in both cases. 
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Fig. 1. Initial finite element mesh and geometry of work piece and die in the 
case with a rigid die 







Fig. 2. Initial geometry and FE mesh of workpiece and deformable die 







Fig. 3. Temperature field distribution in approach zone and bearing zone for 
inc. 550 
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Fig. 4. Temperature field distribution in wire and die for inc. 800 




Fig. 5. Axial - 1, radial ~ 2 and circumference - 3 residual stresses in cross 
section 26-57 in the case with a rigid die 




Fig. 6. Distribution of the residual stresses: l-axial,2-radialand 3-circumference 
in cross section 26 — 57 in the case with a deformable die 
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6 Conclusions 

Proposed numerical condition eq.3 successfully can be used to determine the 
minimum length of workpiece in which steady state flow will be reached. FE 
simulation of thermo coupled wire drawing process can be used to predict more 
homogeneity in a final product, to optimise process parameters - die geometry 
and load parameters and to increase die wear. 
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Abstract. We first discuss the difficulties that arise at the construc- 
tion of difference schemes on uniform meshes for a specific elliptic inter- 
face problem. Estimates for the rate of convergence in discrete energetic 
Sobolev’s norms compatible with the smoothness of the solution are also 
presented. 



1 Introduction 

Interface problems occur in many physical applications. We present a model case 
below to show the characteristic of such type interface problems. Namely, in the 
region D = (0, 1)^ we consider the Dirichlet problem 

—Au + c{x)Ss{x)u = f{x), X G ; (1) 

u = 0, xGr = df2, (2) 

where S' is a continuous curve (for example closed curve), S G and Ss(x) is 
Dirac-delta function concentrated on S. We suppose that 

c(x) G Lao(S), 0 < Co < c(x) < Cl (3) 

almost everywhere on S. 

We assume for simplicity that the curve S separates f? into two regions: 
17 = l7i U 172 , n 122 = 0- Then, at some assumptions for smoothness, the 
equation ( 1 ) can be rewritten as follows: 

—Au = f{x), x G Cl U 172 ; [^(5 = 0 , 

where dujdv - is the normal derivative. 

A classification of interface problems is given in [1] . The most noticeable 
characteristic of the present interface problem is the singular coefficient. This 



du 

dv 



= C (cc) M, 



(4) 



- .S' 
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brings up several substantial difficulties in the numerical analysis process some 
of which are discussed here. The first difficulty arises at the discretization. 
Many current techniques such as harmonic averagion or coefficient smoothing fail 
to give high accuracy in two or higher dimensions. The immersed interface 
method (IIM) developed in the recent years for many other interface problems 
[1] , [2] , [3] is not easy to be applied to (1) , (2) , see [4] , [5] for one-dimensional 
problems. In order to achieve second-order accuracy one must use 4-points stencil 
in the one-dimensional case and 12-points for two-dimensional, but with line 
interface. See also the discussion in Section 2 of the present paper. 

Because of the discontinuity and non-smoothness in the solution and the 
complexity of the interface, it is difficult to perform error and convergence 
analysis in the conventional way. Due to the presence of the interface and the dis- 
continuity or localized nonlinearities, the system of discrete equations lose many 
nice properties, such as symmetric positive definiteness, and diagonal dominance 
etc. In [5] algorithms are proposed for decoupling of the linear and nonlinear 
equations of the discrete systems. For two-dimensional problems with curvelin- 
ear interface such decoupling is not clear. 

The structure of the article is as follows. In Sect. 2 we derive and compare 
numerically three difference schemes in the one- and two-dimensional case. In 
Sect. 3 we formulate some convergence results for linear problems of type (1), (2). 

2 Construction of the Difference Scheme 

In the present Section three difference schemes are discussed and compared nu- 
merically. 

2.1 One-Dimensional Case 

We will analyze three schemes on the computational example 

u” — wu =1-1- K5 (x — () u, K > 0, w = const >0, t6 (0) = t6 (1) = 0. (5) 

Let specify an uniform fixed grid Xi = ih, i = 0, ... ,N, hN = 1. We wish 
to solve the equation (5) only on uniform mesh and the point C will not lie on 
a grid point xj, xj < ( < x/+i, 1 < / < iV, so that the delta function must be 
replaced by appropriately discrete approximation dh (x) . For example, by the 
”hat function” with support {—h,h) 

^(1) = / (^- > |a;| < 

^ 0 , otherwise. 

We shall make use of the following lemma [3] . 

Lemma 1. Suppose dh{x) satisfies dh{x) = 0 for |x| > Mh and also the 
discrete moment condition {xj — O’” dh (xj — C) = <^m 0 for m = 0 , 1, ... , 

3 
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p—l. If f & {[( — Mh,( + Mh]) and is Lipshitz continuous on the 

interval, then 

f{C)-h'^f{xj)dh{xj-C) = 0 {hP) as h ^ 0 . 

3 

First we discuss an integrointerpolation scheme for the equation (5). For 
f ^ + 1 we set 

Vxx,i -wyi = I, 

where ijx and px are standard upwind and backward finite differences (see [6]). 
To get the difference equation in Xi = xj we integrate (5) from cc/_o.5 to x 1+0.5' 

/•xi+0.5 

u' (xz+o.s) — u' {xi-0.5) —w udx = h + Ku (C) . 

Xl-Q.5 

One can use the formula u' (X/-0.5) = (u (xj) — u (x/_i)) fh+O , but similar 
approximation of u' {xi+0.5) leads to local truncation error O (h~^) . In spite of 
all this integrointerpolation scheme (II) in the following form 

K 

yxx,i-wyi = l + 1 = 1 ,. ..,N -I, 5 +i = 

h 

can be found in the literature. 

More accurate scheme one obtains after applying the averaging operator 
(defined below) to (5): 

Uxx,i-^J (1 - u(x) dx = 1, i^ 1 , 1 + 1 , 



Jo, i^ I, 

\1, i = I 



Uxx,i-^(^ J {x-xi-i)u{x)dx+ J {xi+i-x)u{x)dx^-^{xi+i-C)u{Q = 1 , 

Xl-i XI 

Uxx,I+l-^(^ J {x-xi)u{x)dx+ J {xi+2-x)u{x)dx^-^{(:;- Xl)u{C) = 1 . 



Xl + 1 



After approximations of the integrals and w(C) (by Lemma 1) we get the 
difference equations: 



1 



- Kda ) yi-i - ( + w + Kdb ] yi + [ ^ ~ Kdc ) j/7+1 - Kdayi+i = 1, 



/r2 






1 



/l2 



-Keayi-i + - Keb ]yi- { -= + w + Kec yi+i + 7:7 - Kea 7/7+2 = 1 



/l2 



1 



h^ 



where 



a=-pipi+i, b=l-p], c=l-p7+i, 
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, Pi+i Pi C - XI xi+i - ( 

■'=— "“IT' "'’—ir' 

Now the truncation error is 0(h) and comes only from approximation of the 
integrals and u (C). This scheme on Table 1 is denoted by AO. 

The main idea of the IIM consists in appropriate modification of the difference 
scheme at the irregular grid points which are near the interface by using the jump 
condition. For the example (5), we have 

«" (17) = »»7 - '•* [“']< - [”"lc- 

- [<■'"]< + o . 

7" (1/+1) = UxxJ+i + ^ — ^2 I** ]*■' ^ [7");+ 

[■u']^ = ATm(C), = 0, [u"']q = wKu{C). 

The value u{C,) is approximated by Lemma 1. If dh satisfies Lemma 1 with 
p = 2 then the resulting difference scheme is 4-points near the interface and the 
truncation error is 0{h), see [4], [5]. If p = 3, the stencil used is 6-points and 
the truncation error is O (h^) . Here we present the IIM scheme with 4-points 
stencil, but on Table 1 the results for the scheme with 6-points stencil are also 
presented: 

= 1, i = po = l/w = 0, 





yi-i 





VI + 





yi+i - kiayi+i = 1, 



ki = K{d+fw), k 2 = K{e + gw), f = 

6 6 

and a, b, c, d, e are the same as above. 

As the results in Table 1 show the schemes AO and IM-4 points have the 
same order of accuracy. But AO can be easily generalized for the two-dimensional 



case. 
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Table 1. Truncation error with using difference schemes 



N 


II 


AO 


IIM - 4 points 


IIM - 6 points 


19 


2,8733 


3,6060.10"^ 


5,6231.10"® 


8, 1478.10"® 


39 


6,5063 


1,8192.10"® 


2,8544.10"® 


2,0122.10"® 


79 


13,7544 


9, 1281. lO""' 


1,4416.10"® 


5,0011.10"® 


159 


28,2417 


4, 5713.10"® 


7,2442.10"® 


1,2467.10"® 


319 


57,2119 


2,2875.10"® 


3,6221.10"® 


3, 1123.10"® 



2.2 Two-Dimensional Case 

First we present an IIM scheme for the problem (1), (2) when S' is a segment 
parallel to one of the coordinate axes, for example: S = {{xi,X 2 ) '■ X 2 = C) 
0 < xi < 1}. We use the uniform mesh aJ = {{xu,X 2 j) ■ xu = ih, X 2 j = jh, 
i,j = 0,l,...,iV}.Leta;bethesetofthe internal nodes and 7 ~ the set of bound- 
ary nodes. We assume that X 2 j < C < X 2 J+ 1 , 1 < J < N — 1. Now = 0, 

^ 0 u {xi,() and [ux 2 X 2 ]a: 2 =c = After some algebra we have 

-Ux^xi - U-X 2 X 2 + d^h'' - c) c(xii,C) = f{xii,X 2 j) + O {h) . 

For the approximation of u {xu, C) we use Lemma 1. If ytj = u {xu,X 2 j), = 

f{xii,X 2 j) then the difference scheme is as follows: 

^hljij — Uxixi ,ij Vx2X2^ij ~ ^ij i 1, j ^ J 

-Vi-1, J - Vi+i,J - (l - adK^Ci) Vij-I -I- (4 -I- bdh'^Ci) yij- 
- (l - cdh'^Ci) yi,j+i + adh‘^Ciyi^j +2 = 
ach‘^Ciyij-1 - (l - beK^Ci) ytj -I- (4 -|- ceh^Ci) -h aeh^yij+2~ 

— {Vi-l,J+l + Vi+l,J+l) = ^i,J+lh‘^- 

In the case when S is an arbitrary closed curve in 17 we will consider 
difference scheme with averaged right hand side and coefficient. We define the 
Steklov averaging operators as follows [7]: 

Xi-\-h/2 

Tif{xi, X 2 ) =T^ f(xi + h/2, X 2 ) = T^f{xi-h/2, X 2 ) = ^ J f{x[,X 2 )dx[, 

xi—hj2 

X2+hj2 

T2f{xi, X 2 ) = 17 f{xi, X 2 + h/2) = f{xi, X2-hl2) = ^ j f{xi, x' 2 ) dx' 2 . 

X2 — hj2 

Notice that these operators commute and map the derivatives of sufficiently 
smooth function u into finite differences, for example 
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We approximate the boundary value problem (1), (2) on the mesh lo with 

-Ahy + ay = (p in w; y = 0 on 7, (6) 

where <~p = T^T^f and a = T^T 2 (cSs)- The coefficient a in (6) can be written 
as follows 



a(x) 



h ^ [ k{x, x') c{x') dSx ' , 

Js{x) 

0 , 



X € Sh, 

X € u)\Sh, 



where k{x, x') = S{x) = S (1 e(x), e(x) = (xi — 

h, xi + h) X (x 2 — h, X 2 + h) is the cell attached to the internal node x € to, 
and Sh = {x G u) : S{x) yf 0}. 



3 Convergence of the Difference Schemes 



Let assume / G IV 2 Then the problem (1), (2) can be formulated in the 

weak form: 

a{u,v) = {f,v), yveW^ifi), 



where 



a{u, v) 




r du dv 
V9xi dxi 



dudv\ f 

— — 7^ — ]dxidx 2 + / cuvdb 
0x2 0x2' Js 



and (/, v) is duality on W 2 ^(l7)x W 2 (t2). 

The following assertions hold. 

Lemma 2. For each f G the problem (1), (2) has unique solution 

O 

u gW^ {[2). 

Lemma 3. If f G —1/2 < 9 < 1/2, then the problem (1), (2) has 

unique weak solution u G W2~^^(f2). 



3.1 Global Estimate 

The error z = u — y satisfies the equation 

-AhZ + az= -'ifi^xixi-i^ 2 ,x 2 X 2 +X in w; 2; = 0 on 7 (7) 

where 

ifi = u - T^_iU , i = 1, 2, 

X = au — h~“^ / k{x, x') c{x') u{x') dSx' , 

Js(x) 

X = 0, X G u:\Sh- 



X G Sh 
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Let Hh be the set of mesh functions defined on the mesh u and equal to zero 
on 7. We define the scalar products 

{y, v)h = y{x) v{x), [y, v)h,i = X! 



and the corresponding norms ||w||?!, and |[u||/i7, where w* = {x € a; : 0 < < 

1 , 0 < X3_i < 1}, i= 1, 2. Let us introduce also the W 2 mesh norm: 



Wi 



= V 



IVi 



MIL 



\wl^ = \[VxAh+\[VxAl2- 



Taking the scalar product of (7) and summing by parts, we get 



\[Zx^\\l^l + \[Zx2\\l,2 + h‘^ = i'^i.xi, Zxi)h,l+[tp2,x^, Zx^)h,2 + h‘^ ^ X • 

xGSh xGSh. 

Using the difference analog of the Friedrichs inequality [7], we get the a priori 
estimate 




Estimating the terms in the right-hand side of (8) using methodology proposed 
in [7] and [8], we obtain the following result. 

Theorem 1. The solution of the scheme (6) converges to the solution of the 
differential problem (1), (2) and the following convergence rate estimate is valid 

||m - yWwf^ <Ch'^ h\\wf+\s7) - 0 < 6 » < 1/2 . 

3.2 An Improved Estimate for the Rate of Convergence 

Let suppose that solution of the problem (1), (2) has raised smoothness in the 
regions and 122 • Now, the following improved estimate for the rate of conver- 
gence of the difference scheme can be proved: 

3.3 Line Interface 

Here we consider the case when S' is a segment parallel to one of the coordinate 
axes. Let, for example, S is given by the equation X2 = C- Contrary to the 
previous cases, in this section we assume that f = Jh. Notice, that then the 
inequality Co/h < a < Ci/h is fulfilled. 

Taking the scalar product of (7) with 2; and summing by parts, we get a priori 
estimate 
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xeSh 



1/2 / 7 , \ 1/2 



xeSh 



v2\ 1/2 

xeSh ^ 



where ■i/'i(3;i)C) = ~ 



h r du 



6 



dx-2 



= --c{xi)u{xi,C) and tpi{xi,C) = 
_ _HxiX) 6 

V'i(a;i,C) - V'i(a;i,C)- 

Now, the following estimate for the rate of convergence of the scheme (6) can 
be established: 



\u-y\\w}, < Ch'^ 



f 


d^u 




d^u 




d^u 


V 


dx\ dx 2 


1 

L2{0) 


dx\ dxl 


1 


dx\ dxl 



L‘2{^2) 



l<^llvF|(o,i) \W\w^{S) I ■ 
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Abstract. In this paper finite-difference schemes approximating the 
one-dimensional initial-boundary value problems for the heat equation 
with concentrated capacity are derived. An abstract operator’s method is 
developed for studying such problems. Convergence rate estimates con- 
sistent with the smoothness of the data are obtained. 



1 Introduction 

One interesting class of parabolic initial-boundary value problems (IB VPs) mod- 
els processes in heat-conduction media with concentrated capacity. In this case 
the Dirac’s Delta distribution is involved in the heat capacity coefficient and, 
consequently, the jump of the heat flow in the singular point is proportional to 
the time derivative of the temperature. Dynamical boundary conditions cause 
similar effect [4], [8]. These problems are non-standard and classical analysis is 
difficult to apply for convergence analysis. 

In the present paper finite-difference schemes (FDSs) approximating the one- 
dimensional IBVPs for the heat equation with concentrated capacity or dynami- 
cal boundary conditions are derived. An abstract operator’s method is developed 
for studying such problems. Sobolev’s norms with weight operator, correspond- 
ing to norms L 2 , and are constructed. In these norms convergence 

rate estimates compatible with the smoothness of the IBVP data are obtained. 
Analogous results for equation with constant coefficients are obtained in [6]. 
Convergence of FDSs for the problems with smooth solutions were investigated 
in [I], [2], [3] and [15]. 

2 Preliminary Results 

Let H he & real separable Hilbert space endowed with inner product (•, •) and 
norm jj • jj and S - unbounded selfadjoint positive definite linear operator, with 
domain D{S) dense in H. The product (m, v)s = {Su, v) {u, v G D{S)) satisfies 
the inner product axioms. Reinforceing D{S) in the norm ||it ||5 = {u, u)g we 
obtain a Hilbert space Hs C H . The inner product (rt, v) continuously extends 
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to Hg X Hs, where Hg is the adjoint space for Hs- Operator S extends to 
mapping S : Hs —>■ Hg. There exists unbounded selfadjoint positive definite 
linear operator [10], [7], such that = Hs and (u, v)s = {Su, v) = 

. We also define the Sobolev spaces VF|(a, &; H), W^ia^b', H) = 
L 2 {a, 6; H), of the functions u = u{t) mapping interval (a, b) C R into H [7]. 

Let A and B are unbounded selfadjoint positive definite linear operators, not 
depending on t, in Hilbert space H, with D{A) - dense in H and Ha C Hb- In 
general, A and B are noncomutative. We consider an abstract Cauchy problem 
(comp. [16], [9]) 



du 

B— + Au = f{t), 0 < t < T; u{0) = uq, 

at 



( 1 ) 



where uq is a given element in Hb, fit) € L 2 ( 0 ,T; Ha-i) - given function and 
u(t) - unknown function from (0,T) into Ha- 
The following proposition holds. 

Lemma 1. The solution u of the problem (1) satisfies a priori estimates: 



lo 






B-i 



duft) 



dt 



B 



dt<c(^\\uofA+ \\m\\i-,dt^, 



ifuo e Ha and f € L 2 ( 0 ,T; Hb-i); 
if € Hb and f G L 2 ( 0 ,T; and 

pT 



i+j^ wmwUdt 



\u{t)\\ldt<C{\\Bu,rA- 



\\A-^m\\Bdt 



if Buo G Ha-i and A ^f G L 2 { 0 ,T; Hb)- 

Setting in (1) f{t) = dg{t)/dt we get the Cauchy problem 

B^+Au='^, 0<t<T; u{0) = uq- 

dt dt 

The following assertion is valid. 

Lemma 2. The solution u of the problem (2) satisfies a priori estimates: 





( 2 ) 



pT pT 






\u{t)w\dt + £ £ ikw _»(oiii ^ 
-dtdt'+f + 



|woIIb+ 



\\9{t)-g{t' 



\t-t'\ 



R-i dt 
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if uq G Hb and g G Hb-^); and 

£ \\u{t)\\l dt < c(^\\Buo - gm\U + d?j , 

if Buo - g{0) G Ha~i and g G i2(0,T; Hb-i). 

Analogous results hold for operator-difference schemes. Let Hh be finite- 
dimensional real Hilbert space with inner product {■,-)h and norm || • ||/i. Let Ah 
and Bh be constant selfadjoint positive linear operators in Hh, in general case 
noncomutative. By Hs^, where Sh = > 0, we denote the space Hs^ = Hh 

with inner product (v, w)sh = {ShV, w)h and norm ||u||s;, = {ShV, v)ff . 

Let LOr be an uniform mesh on (0,T) with the step size r = T/m, lo~ = 
Wr U {0}, ujf = Wt- U {T} and uir = oJtGI {0,T}. Further we shall use standard 
denotation of the theory of difference schemes [11]. 

We consider the simplest two-level operator-difference scheme 

BhVi+ AhV = u(0)=uo, (3) 

where vq is a given element in Hh, ‘^(t) is also given and v(t) — unknown function 
with values in Hh- Let us also consider the scheme 

BhVt + AhV = ift, t Gujf; v{0) = vq, (4) 

where '0(f) is a given function with values in Hh- 

The following analogues of Lemmas 1 and 2 hold true. 

Lemma 3. The solution v of the problem (3) satisfies a priori estimates: 



h \ 

, ^ -h , ^ ^ 4 - 



BT 









t^LJ-r t'GUJT, 



\Ht)-vit')\\l^ 

\t-tT 



< 



<C(||uo||L+r||uo||l 



r 



rJ2 Mmk<ci\\BhVo\\l-.+rJ2 









Lemma 4. The solution v of the problem (4) satisfies a priori estimates: 



E E E 






tGUJT 



Ht)-v{d)\\k 



< c 



Bh.+T lko|Uh + 
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E 



wm-mwi-. 

i. 



\t-t'\ 



+ -E (7 + T^)lW‘)lli^-. 

t^LJ-r 



tGuj:i 



T E ii^(^)iik ^ c'f E 



B 7 



tGuj:i 



3 Heat Equation with Concentrated Capacity 



Let us consider the IB VP for the heat equation with the presence of concentrated 
capacity at interior point x = ^ [8] : 

[c{x) + K5{x-^)]^- -^{a{x)^^= f{x,t), (x,t) G Q, (5) 

m(0, t) = 0, u(l, t) = 0, 0 < t < T ( 6 ) 

M(a;, 0) = uo(a:), a: € (0, 1), (7) 

where Q = ( 0 , 1 ) x ( 0 , T), K > 0 , 0 < ci < a(a;) < C 2 , 0 < C 3 < c(x) < C 4 

and (5(a;) is the Dirac’s distribution [14]. From (5) follows that the solution of 

the problem satisfies the equation 

, . du d / , . du\ , 

for {x,t) G Qi = (0, X (0, T) and {x,t) G Q 2 = (^, 1) x (0, T), while for 
X = ^ the conjugation conditions 



= u{^ + 0,t)~ u{^ -0,t) = 0, 



■ ^] ^ du{^, t) 

dx\x=^ dt 



are fulfilled. 

It is easy to see that the IBVP (5)-(7) can be written in the form (1), 
where H = ^ 2 ( 0 , 1 ), Ha (0,1), Au = --^[a{x) and Bu = [c(x)+ 
K S{x — ^)] u{x, t) . Further, 

\\M\\ = a{x)[w'{x)]'^ wGW^{0,1), 

\Mb= [ c(x)u;2(x)dx + iFu;2(^) X ||u;||2^(g +w;2(^) , w €1^2^ (0, 1) . 

Let cuh = {a^i, X 2 , . . . , Xn-i\ be a nonuniform mesh in (0, 1), containing the 
node Denote = lohA {xq}, U {x„}, U {xq, x„}, xq = 

0, x„ = 1 and hi = Xi~ Xi-i. Also denote Vx = (u+ — v)/h+ , Vx = {v — V-)/h, 
v± = {v+-v)/h, V = v{x) , v± = v{x±) , X = Xi , x± = Xi±i , h= {h+ h+)/2 . 
We assume that 1 /cq < h+/h < cq , cq = const > 1 . 
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We approximate the IBVP (5)-(7) on the mesh ujh x cOr by the implicit FDS 
with averaged right hand side 

{c + K Sh)vi- {avs;)s: = f, (a;, t) S x w+, (8) 

t>(0,t) = 0, t>(l,t) = 0, t € (9) 

t>(cc,0) = uo(a;), x G ioh, (10) 

where d{x) = [a(x) + a(x — h)] /2, 5h = 5h{x — ^) = | ^ ^ is the 

mesh Dirac’s function, and Tj, are Steklov averaging operators [12]: 

TT t) = f{x, t-r) = - / f{x, t') dt\ 

^ J t — T 

= T / f(x\t)dx', T+f{x,t) = ^[ f{x\t)dx', 

^ Jx- Jx 

x+ 

= ^ j k{x,x') f{x',t)dx', k{x,x') = 

X — 

Notice that these operators comute and map partial derivatives into finite dif- 
ferences, for example = u^x , T~ ^ = ui ■ 

Let Hfi be the set of functions defined on the mesh u)h and equal to zero at 
X = 0 and x = 1. We define the inner products {v, w)h = J2x&ujh w{x) % , 
(x, ^ corresponding norms ||w||/j = ||w||L 2 ,fe = 

(w, w)]!^ , \\w]\h* = {w, w]l{^ . 

The FDS (8)-(10) can be reduced to the form (3) by setting AhV = —{dvx)x 
and BhV = (c -I- A" 5h) v. For w G Hh we have 

IkllL = w)h = ^ a(x) wl{x) h X IjwsjlL, 

X^Uj'^ 



l+(x^ — x) ^ f ^ 

^ X- < X < X 

, X < x' < Xj^ 



h 

l — {x' — x) 



3 ^ = {BhW, w)h = ^ c(x) w'^{x) fi + K 



I Bnh ’ 



xGuJh 



-1 = {Bf^^w, w)h = 



2(x) 









c(x) K + dc{^) 



1 33 - 1 5 

°0h 



where Soft, li; = (1 -I- 5h) w. 

Let us introduce the mesh Sobolev norms with weight operator Boh'. 



\\w\\~ = ||w||r„, = \\W 

-^2, A 



Boh = ll“'llL,h ' 



w\0 



M~. =lk.]lL + lklllo., 






2,h 



'B- 



^x\\h* ' 



I Boh ’ 



i^2,Ar 






t^UJ-r 



'-'2,h 
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'w. 



1 . 1/2 

2,/ix , t ^ 1 

’ +£Z/.i ’ +£Z/.i *ir~, . 



I- 

'VK, 



tGtJT t ' GiI > T , t'#i 



l^-^T 



2 f I ^ — 2 j h. I 



t^LJ-r 



Lemma 5. Let A^v = —{avx)x o,nd BhV = (c + KSh)v, where a and c are 
eontinuous funetions. Then the norm ||t’||A^ is equivalent to the mesh norm 
W 2 h- If the funetion a(x) is eontinuously differentiable then the norm ||A^ti||^-i 

is equivalent to the mesh norm 



— 1 1 /2 

3.1 Convergence in W 2 ’f^f 



Let u be the solution of the IBVP (5)-(7) and v - the solution of FDS (8)-(10). 
The error z = u — v satisfies 



{c+ K 6h) Zt~ {azsf)x = V’i - Xx, (x,t) € x w+, 

z(0,t) = 0, z(l,t) = 0, tGu/f, 
z{x,0) = 0, X e uJh, 



( 11 ) 

( 12 ) 

(13) 



where Ip = cu-Tf{cu)+(^i^ {cu)x^ ^ and x = aUx~T^ {cu)^ 

From Lemmas 3-5, using inequality 



||Xi|U-i = inax 



KXx, w)h\ 



weHh ||u;||Ah 



I - ( x , wx]hA / 1 II II 

= max r < — llxJU* , 

weHh Il'fi'IUh Cl 



one immediately obtains the following a priori estimate for the problem (11)-(13) 






E ■ 

teuiT t'euiT, t'^t 






+^E + ll^(•’^)llB-^+^ E iix(-,^)]i 

tGl^T I + 



1/2 



(14) 



Therefore, in order to estimate the convergence rate of FDS (8)-(10) in 
1 it is sufficiently to estimate the right hand side terms in (14). Using 
integral representations of ip and x ^'iid Ih® form of corresponding norms, simi- 
larly as in [5], [6], we obtain the following convergence rate estimate 



lAlppi.i/^ “ ^ (^max + x) (||a||iy|(o,l) + || c|| 14/I (0.1)) 



d'^i 



dxdt 



L2{Q) 



d~C h^^^ i/inlTi' (ll«llw|(o.i) + l|c||w|(o.i) 



u||^3,3/2(Qi) -I- ||u||^3,3/2^g^ 
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3.2 Convergence in L 2 ,ht 

Let us consider the following approximation of the initial condition (7) 



v{x, 0) 



Kuo{i)+hT^icuo)iO 
K+h c(C) 



X eujh\ {5} 

x = ^. 



(15) 



Let u be the solution of IBVP (5)-(7) and v - the solution of FDS (8), (9), (15). 
The error z = u — v satisfies the conditions (11), (12) and 



(c{x) + K Sh{x - ^)) z{x,0) = '(p{x,0) - f3s;{x,0) , x £ ujh, (16) 

where /3 = ^ (cu)j . The term Xx can be represented in the form Xx = {a^x)x + 
otx + !3m, where n = u - T~ u and a = a T~T~ (f^) ~ T~T~ (^a . 

Lemma 6. For the solution of FDS 



{c + K Sh) zt~ {azx)x = -Pm, (x,t) e w/, x w+, 



with homogeneous initial and Diriehlet boundary conditions the following a priori 
estimate holds: 












1/2 



< 



< C 



E 



ll/3(- ,i) - P{- ,t') 



IL 



t^UJT t'y^t 






+ T 



t^UJ-r 



(i+i^)ii«' 



,t)]| 



1/2 



Using Lemmas 3-6 for FDS (11), (12), (16) we obtain a priori estimate in 
the form 



kll~ <c 

^2,Ht 



E 



L tGu:! 



-^1 + ^ IIm( ‘ ^ ^)\\‘Boh ^ ^ ll*^( ‘ ’ ^)]lL+ 

1/2 



t£u)2 



t£uj2. 



t'GUJT, ' t^UJT 

(17) 

From (17), using integral representations of fj, /i, a and /3, we obtain the 
following convergence rate estimate for FDS (8), (9), (15) 



< C{h4ax x/lnl/r + r) (|| 




VP|(0,1) + lklllV|(0,l) + 1 

du{f, ■ ) \ 



dt 



Z, 2 ( 0 .T)^ 



X 



(18) 



Remark 1. In estimate (18) requirements on the smoothness of coefficients a 
and c are overstated. Analogous estimate when a, c G 1^2^ (0, 1) can be obtained 
using so called ’’exact FDS” [5] for approximation of ^(a(x) |j). 
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3.3 Approximation and Convergence in 

Following [13] we approximate the equation (5) as follows 

K S h') V^-\ ^ ^ ^xx ^xx ^x') — ^x^t f' 

Boundary and initial conditions we approximate with (9) and (10). We also 
assume that cq <2 and h+ = h for x = ^. 

The error z = u — v satisfies FDS 



(c + K Sh) Zi 



h+ — h 



(c Z^xt (fi Zx^x 



— h 



( Qjx Zxx 



s) = 



with homogeneous boundary and initial conditions (12) and (13). Here 




We also denote AihZ = {ax Zx± ~ a®* Zx) and BihZ = (cz)x ■ 

The following assertions hold true. 

Lemma 7. If c £ C^[0, 1] and the maximal step size of the mesh ujh is suffi- 
ciently small {hmax < (1/6 — e)/||c||ci[o,i], 0 < e < 1/6) then the following 
inequality holds 



\{BihZ, z)h\<{l-e)\\z\\%^, zGHh. 
Lemma 8. If a £ C^[0, 1] then 



\\Aihz\\g-i <^^^^\\a\\c2[o,i]\\z\\^2 , z£Hh. 

0^ D 2,h 

From Lemmas 1, 7 and 8 one obtains the following a priori estimate for the 
solution of FDS (19), (9), (10), assuming hmax is sufficiently small 



1/2 



( 20 ) 



From (20), using integral representation of (p, one obtains the following con- 
vergence rate estimate for FDS (19), (9), (10) 



^ (^max + 'T) (ikll W|(0,1) + l|c|| W|(0.1)) |l 






+ 



du 






du 

m 



Wf°iQ2) 



du 
LdxJ (c,.) 



i2(0.T) 



r d‘^1 



-dxdti ({, ■ ) 



L2{0,T) 
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4 Problem with Dynamical Boundary Condition 



Let us consider the IBVP for the heat equation with dynamical boundary con- 
dition for a: = 0 (see [15]): 




A 

dx 



au(o, t) 



( Ou \ 

a(x) ^ j = /(x, t), X e (0, 1), 0 <t <T, 

= a(0) , u(l, t) = 0 , 0 < t < T, 

ox 



( 21 ) 

(22) 



and initial condition (7), where iC > 0, 0 < ci < a(x) < C 2 and 0 < C 3 < 
c(x) < C4. 

IBVP (21), (22), (7) can be treated as the limit case of IBVP in (— e, 1) x 
( 0 ,T), with concentrated capacity in x = 0 , when e — > -|- 0 : 



[c{x) + KS{x)] — 




f{x,t), xe(-£, 1 ), 



0 < t < T, 



dVj{ — £ 

r — ^ — = 0 , m( 1, t) = 0, 0 < t < T; u(x, 0) = mo(x), x S (— e, 1). 

ax 

IBVP (21), (22), (7) can be written in the form (1) where H = ^ 2 ( 0 , 1), Ha = 
W^ 2 ( 0 ;l) = {w G ^^ 2 ( 0 , 1 ) : w(l) = 0 }, Au = -^(a(x)fy) and Bu = 
[c(x) -I- K <5(x)] u(x, t) . We have 



w 



Ia = ^ a{x) K(x )]2 dx X Ikll^i(o.i) , 



w&WliQA), 



lklls=/ c(x)tc^(x)dx-hA:u;^(0) X ||w||i^(oi)-h-u;^(0), w&Wl{Q,l)- 

As in the Sect. 3, we introduce a nonuniform mesh Qh on [0,1]. Let Hh be 
the space of mesh functions vanishing for x = 1. We define the inner product 
[u, w)h = ^ v{0) w{0) -h ^ Corresponding norm l[wj|/t = 

|[w^lU2,ft = [w, 

We extend definition of Vx, Sh(x) and T^f{x,t) to include the point x = 0 : 



Vx 



-, X G UJh 



Vj ^—1 

H-H± a; = 0 

/ll ’ o, 



Sh = 5h{x) = 



X € io? 



2/hi, X = 0, 



(l-|-)/(x',()<ii'. 

We approximate the IBVP (21), (22), (7) with the following FDS 
{c + KSh)vi- {dvx)x = /, (x,t) e x w+, 

u(l, t) = 0 , t € ujx, 



(23) 

(24) 
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with the initial condition (10). 

The FDS (23), (24), (10) can be rewritten in the form (3), where difference 
operators A^v = —{avx)x and BhV = [c{x) + KSh{x)]v are defined also 
for X = 0. For w € Hh we define energetic norms |[r(;||Ah, 
and weighted Sobolev norms |[w||7 , 1/2, |['u;|| ~2,i analogously as in 

Sect. 3, substituting norm || • ||?i with |[- ||/i. 

Analogous results as for the equation with concentrated capacity hold true. 
The FDS (23), (24), (10) satisfies the convergence rate estimate 

^ ^ {^max + T^ (llnlllVKO,!) + 11^11^1(0.1)) ll^ll • 

FDS (23), (24) with the initial condition 

( T^{cuo){x) 



v(cc,0) = 



2 K tto( 0 )+/ii T^(cuq)( 0 ) 
2K+hi c(0) 



X & to 
X = 0 



h 



satisfies the convergence rate estimate 

\[z\\~^ Vlnl/r + r) (||c||^v2(o_i) + 



|n|lw|(o,i) + 1) 



(Q) 



911(0, • ) 



dt 



L 2 (o,t) 



FDS (19), (24), (10) with the boundary condition 

{c+ KSh)vi- {avx)x = T^T^ f , x = 0 
satisfies the convergence rate estimate 



|[^||(^2,l < C (h‘^ax + ^) (ikll VV'|(0,l) + l|c||w|(0,l) 



du 






5 Weakly— Parabolic Equation 

Let us consider the following IB VP: 

16(0, t) = 0, 16(1, t) = 0, 0 < t <T; 

where K > 0, 0 < ci < a{x) < C2 and <i(a;) is the Dirac’s distribution. 

The IB VP (25) can be also rewritten in the form (1), where A16 = 

if) ~ K 6{x — ^)u{x). In this case, A is positive definite 

O 

operator in the space Ha =W 2 (0,1), while B is nonnegative operator in Ha 



a; G (0, 1), 0 < t < T, 

u{^, 0) = 160 = const. 
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and ||u||^ = K u^(^) . It is easy to see that a priori estimates from Lemmas 1 
and 2, not involving inverse operator still hold. 

Using denotations from Sect. 3, we approximate IBVP (25) in the following 
way 



K5hVt-{dvs)x = T^T^ f, {x,t)GuJh'X' 



;(0,t) = 0, z;(l,t) = 0, tG 






;(C,0) = Mo- 



The error z = u — v satisfies the following FDS 



KShZi- {dZx)x = ~Xl,x: {x,t)GUJhX^ 



(26) 



(27) 



z(0, t) = 0, z(l, t) = 0, t£uj+; z{^, 0) = 0, 
where xi = aux~ T~T~ (^a . 

FDS (27) is of the form (3), where AhV = —{dvx)x is positive definite linear 
operator in Hh and BhV = K ShV is nonnegative linear operator in Hh- We have 

1/2 

\\v\\b^ = VkH0\- 



lIHU. = I E 

We also define the norm 



dvlU 



kll~nV2 = T XI + hx{-, t)]lL + ip)- 

Ke , i )-^ e , t')P 



E 

t£u)T t' t'z/zt 






From Lemma 3, using discrete Friedrichs inequality and imbeding theorem 
maxj^gs^ k(3^)l < 0-5 [11], we obtain a priori estimate 



r 1 

l^c^i/2 < C |r E llxi( ■ I ^)]lLj I 



giving the following convergence rate estimate of FDS (26) 

d'^u 



< C {h^a.x+T) ll«llw|(0,l) 



dxdt 



L2(Q) 



T II m|| ^2,o^q^ j + II m||,^/2,o 



W^2' (Q2: 



From Lemma 3 we also obtain the estimate in the ’’weak” norm 

T- E E w^h'^xhxi-, , 






and convergence rate estimate 






'I 1/2 

)■ < Ct 


■ ) 


I - 


dt 



+chl 



*llw|(o,i) (ih 



W2' (Qi) 



i2(0.T) 

M^2’“(Q2) 
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Abstract. Two elliptic equations with power and interior boundary lay- 
ers, respectively, in a square are considered. The elliptic problems are 
reduced to systems of ordinary differential equations by the method of 
lines. For construction of difference schemes fitted operator technique is 
used. Uniform convergence for the scheme of the first problem is proved. 



1 Introduction and Statement of the Problems 

The numerical solution of multi-dimensional singularly perturbed differential 
equations is highy complicated and problem dependent process. The solution 
may contain interactions between different layers. A method developed for par- 
ticular problem may, or not work for another with stronger (or weak) type layers. 
Methods, which work well in one space dimension may or may not be easily ex- 
tended to two or three dimensions. 

The analysis of this talk is based on solving ODE singularly perturbed prob- 
lems that generalize to higher spatial dimensions. This is the philosophy of the 
method of lines (MOL). In MOL applied to two-dimensional elliptic problems 
the discretizations with respect to the first (x) and second (y) independent vari- 
ables are decoupled and analyzed independently. We discuss the MOL for the 
following two problems. 



1.1 Formulation of Problem 1 (PI) 



Some physical processes, as pollution transfer, lead to elliptic problems including 
variable coefficient of turbulent diffusion. When diffusion coefficient is linear 
function of a coordinate, we have a problem with a power layer: 



d^u , . d'^u . .du , ^du 



c{x,y)u 



f{x,y), (x, 2 /)gG, 



G = {0 < X < 1, 0 < ?/ < 1}, M |_Td= 0, y) + ^m( 1, y) = 0, T = u FMix, 

( 1 ) 
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where functions a, w, c, f are enough smooth, 



£ > 0, a{x,y) > a> 0, w{y) > /3 > 0, c{x,y) > 0, c(x, j/)+4o^(x, j/) >0, <5 > 0. 

( 2 ) 



1.2 Formulation of Problem 2 (P2) 

The second problem is a singularly perturbed reaction-diffusion equation with 
known singular sources: / is a smooth at least in 17\S', 

+ ^ = J Q{t)s(y-l{t)'^dt (3) 

s 



u = 0 on df2 = r. 

The solution to (3) satisfies 



= 0 and 




Q , 



( 4 ) 

( 5 ) 




Fig. 1. Semidiscretization: 
regular A and irregular points B, C 




Fig. 2. Local coordinate system 



Here [-Js is the jump of the corresponding quantity across S, 5 (•) is the 
Dirac-delta function and the interface curve F = (t) = {x{t),y{t)) is 

parametrized by arclength t. The normalized tangent direction to T at t is 
if (t) = {x' (t) ,y' (t)) = (—sin 0(f) , cos 0(f)) , where 9 (t) is the angle from 
the vector (1,0) to the outward normal ^ (t) = (cos 0 (t) , sin 0 (t) , ) to F at 
t. For simplicity we shall explain our construction on the case of interface curve 
presented on Fig. 2. 
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It is well known that in the case of absent of interface boundary layers at 
cc = 0, 1 and corner layers at the four corners of the square appear when e 
is small. Now, in additional, interior layers near to the interface and interface 
corner layers at the intersection points of the interface with the boundary F 
occur. 

The main goal of the paper is to construct difference schemes for (PI) and 
(P2) with the property of uniform convergence with respect to the small 
parameter on any uniform mesh. 

In the equation (I) the derivatives with respect to x are discretized. For 
construction of difference scheme the fitted operator method is applied [1] . The 
uniform convergence in the strong norm of the difference scheme solution to the 
differential problem solution is proved. 

It is convienient for the problem (P2) the semi-discretization to be done with 
respect to y. The immersed interface method (IIM) is applied on an uniform 
Cartesian grid. The main idea of the IIM, [2] , consists in modification of the 
difference scheme on the irregular points, Fig.l, near the interface by using jump 
conditions. Such obtained ODEs is solved by the fitted operator method again. 

The proof of all statements in this paper and numerical experiments are 
included in forthcoming papers of the authors. 



2 Problem (PI) 

The following proposition describes the layer which arises in equation (1) as 

£T ^ 0. 

Lemma 1. For solution of (PI ) and it’s derivatives the following estimates hold: 

llull = max|u(x,y)| < a“^||/|| 
x,y 



d^u 

dx^ 



<C\l + e^ ^exp{ae ^{x 



1))] , fc < 4, 



( 6 ) 



du C 
— < • 

dy e + y 

If in addition (3 > 1, then 



du 

dy 



<C + C 



e + y 



/ 3-1 



e + y 



(7) 

(8) 



Here and everywhere later, C will denote a generic positive constant that is 
independent of the perturbation parameter e. 

In order to approximate the boundary value problem (1), we introduce the 
following notation: 



wi = {xi = ihi, 



i = 0,...,N} , W2 = {yj=ih2, j = 0,...,M} 
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w = wixw2, Vi{y) = v{xi,y). 
The semidiscrete scheme takes the form 



L,v = {e+y)v” +w{y)Vi+e'^ — ^ -a^{y) ^'‘ ^ -Ci{y)v^{y) 



My), 

(9) 



^^i(0) = Wi(l) = 0, 0 < i < N, vo{y) = 0, ^ + Svn = 0, (10) 

hi 

where a* = a{xi,y),Ci = c{xi,y),fi = f(xi,y), i = 1, 

We begin the investigation of (9) , (10) with the following maximum principle 

Lemma 2. Let e > 0,ai > 0. Let there exists a vector-function 4>{y)> such that 

4>{y) >0, L,4><0, 0<i<N, D^(j) > 0. (11) 

Let 'L{y) is sufficiently smooth vector-function and 



< 0, !f*(0) > 0, !f*(l) > 0,0 < t tf'o(j/) > 0, H'‘!f>0. (12) 

Then for all i , y G {0 , 1) 'f'i(y) > 0. 

The next lemma establishes convergence of the solution v of the semidiscrete 
problem (9), (10) to the solution u of the differential problem (1). 

Lemma 3. The error Zi (y) = Vi (y) — u{xi,y) satisfies the inequality 

m&x\zi{y)\ < Chi, i = Q,l,...,N. (13) 

V 

The problem (9) , (10) can be rewritten as follows 

/ X , ^dvj ^ , 

+ + = «(»), 



i(0) = ?;i(l) = 0, 0 < i < iV, ■Uo(y) = 0, D'^vfiy) = 0, 



r, , , Vi+i - 2Vi + Vi-i Vi-Vi-i 

Fi = ji-\- CiVi - e h Gi . 

III 



We can use special nonuniform mesh on y (fitted meshes [1] , [3] , [4] ) to 
get approximations with the property of uniform convergence. Here however, 
following [5] , we derive the fitted operator scheme 
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ph-pli-i F,. 



Dj{e+yj) 



Pi^j+l Fij-\-i 

h2 Wj+1 



Dj+i{e+yj) 



where 



+ l _ n h _ p ■ _ p 1 AT 

’ Pi.O Pi.M ^ 1 

Wj+i Wj 



Po,3 = + ^Pn,3 = 0- J = 0> ■- M, 



(15) 



D, = {1-W,)h2 [(e + y,)^--^ - + , 



Fij — £ 



p’l+ij - 

hi 



+ a{xi,yj) 






- 

Pi-l,j 



+ c{xi,yj)plj + f{xi,yj). 



Finally, the uniform convergence of the fully discrete solution is proved in the 
next theorem. 



Theorem 1. Let u is solution of problem (1), p^ is solution of the scheme (15). 
Then 



\\p^ - [u]o\\ < C[hi + \lnh2\h2]. (16) 

In the case P > 1 

\\p^-[u]o\\<C{hi + h2). (17) 



3 Problem (P2) 

We introduce the uniform mesh Wy = {yj : pj = jk, j = 0, . . . , J, Jfc = 1} and 
approximate the second derivative of u with respect to y as follows: in regular 
points A 



d'^u _ u{x,yj-i) -2u{x,yj) + u{x,yj+i) ^ ^^, 2 \ 

W - P +0[k ), 

x€ {0,l)\{p{yj-i) ,p{yj+i)); 

in irregular points B, C 

d'^u _ u(x, yj-i) - 2u{x, yj) + u{x, yj+i) 
dy'^ k"^ 
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( [m] yj -1 - y (x) 


du 


1 {yj-i-y{x)f 


d'^u 


1 ft2 fc2 


dy 


2ft2 


9y2 



+ O (k) , 

(x,y{x)) 



<x< ^{vj ) ; 



d'^u _ u{x, yj-i) - 2u{x, yj) + u{x, yj+i) 
dy'^ 



( M 1 yj+i - y {x) 


du 


1 {yj+i - y {x)f 


'd^u' 


1 fc2 fc2 


dy 


ft2 


dy"^ 



+ O (ft) , 

{x,v(x)) 



vivj) <x< y^ivj+i) ■ 

Here we shall describe our construction on the case of Fig 2. For the function 
v{y)i y G [0)1] assume that there exists their inverse one y = (p~^(x) 
(or y = y{x)) , cc € [0, 1] . Also, 

t{y)= [ a/I + (f"^dp, i/G[ 0,1], y(t)=f-i(t), X = (fi{y) = (fi{y{t)), 

Jo 



s = sin 0(f), c = cos 0(f), 0 < t < t(l). 

The jumps are calculated from (3), (5). As a result the following ODEs with 
zero boundary conditions arises: 



e'^v" + Av = F (x) + Q (x) S {x — () , (18) 



V = 




A = threediag 




2 1 
¥’ ¥ 



(FAx) \ 

A (x) = : , -F) (x) 

\Fj-iix)) 



f{x,yj) + 0 ,0 < X < v3(%-i), 

/ (.X, yj) + 4>ji , y}{yj-i) <x< 
f {x, yj) + (j)j 2 , <p{yj) <x< (f{yj+i) 
f{x,yj) + 0 ,(p{yj+i) < X <1, 



4>ji 



Q{x,y{x)) s yj-i -y{x) 

£2c2 + s 2 • /j2 + 
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4>j2 



Q{x,y{x)) s Vj+i - y (x) _ 

£2c2 + s2 • 



R = 



[/] 

£2c 2 + g2 



+ 2cs 




Q \ {vj-i -y{x)f 

£2c 2 + fi2 ^2^2 g2 J 2K^ 



Q {x) = diag 



( cQ (x,y(x)) 




cQ (x,y(x)) 


y £2(.2 _|_ g2 


•) ■ 
yi 


’ £2(;2 -(- g2 



yj-i/ 



6{x-C) 



I <5(x-Cj) 
\(5(x-0-i) 






The eigenvalues and their corresponding eigenvectors of the matrix A are as 
follows 



\ 2 , JTT 

Ai = — Sin — t, 

^ ek 2J’ ^ 



/I 



41 



^jm — 






2 . Trmj 
^sin^, 



m,j = 1, 



.J-1. 



Let V (x) is the solution vector function of the ODEs (18) in which F (x) ~ 
F (x) = F (xi) , xG{xi-i,Xi), 1 = 1 ,...,/, W = 1. Then for it’s components 
we have the representation 



j-i 



v" (x) = {p\j (x) V (Xi-i) + (x) v{Xi)- {Plj (x) + (x)) 

1=1 



- (py (x) D (Xi-i) + pIj (x) D (Xi)) Ij) Ij + + D{x) , 



where 



Pi-i (a;) = 



sh Xj (xj — x) 
sh Xih 



P2i (a;) = 



sh Xj (x — Xi_i) 
sh Xih 



B^ = A~^F\ D{x) = 



(Di{x) \ 


( JH 




Dj (x) = 


\Dj-i {x)J 


[ 1-0 



) Cl < X < 1 , 



dj = -Q{x,y{x)) 



(^(yj),yp 
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Now our difference scheme can be written in a vector form as follows: 

v° = v ( 0 ) = 0 , 

-Av" + CiV -B,V = Wi, i = (19) 



where 



V^ = V{1) = 0, 



Ai = lla: 



nm 5 ^nm 






Y' i- 






j-i 

% — 4 I j ^ C' ■i — 2 A j ct h Aj hi jn^j jn 5 ^ ; TTl — 1 ^ * * * ? J 1 

i=i 



f i = {A, - 0.5C,) (B* + - AD (x,_i) + C^D (xi) - BD (x,+i) + 



(x, + 0) - iA^ (xi - 0) , 

The scheme (19) has 0{h + k) local approximation. The accuracy with re- 
spect to y can be improved if at the approximation of d^u/dy^ on the irregular 
nodes one adds the jump [d^u/dy^] and with respect to x piecewise linear ap- 
proximation of the coefficients is used. 

The full proof for uniform convergence of the scheme is still an open question. 
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Abstract. Infinite-dimensional gradient method is constructed for non- 
linear fourth order elliptic BVPs. Earlier results on uniformly elliptic 
equations are extended to strong nonlinearity when the growth condi- 
tions are only limited by the Hilbert space well-posedness. The obtained 
method is opposite to the usual way of first discretizing the problem. 
Namely, the theoretical iteration is executed for the BVP itself on the 
continuous level in the corresponding Sobolev space, reducing the non- 
linear BVP to auxiliary linear problems. Thus we obtain a class of nu- 
merical methods, in which numerical realization is determined by the 
method chosen for the auxiliary problems. The biharmonic operator acts 
as a Sobolev space preconditioner, yielding a fixed ratio of linear con- 
vergence of the iteration (i.e. one determined by the original coefficients 
only, independent of the way of solution of the auxiliary problems), and 
at the same time reducing computational questions to those for linear 
problems. A numerical example is given for illustration. 



1 Introduction 

This paper is devoted to an approach of numerical solution to strongly nonlinear 
fourth order elliptic boundary value problems. The usual way of the numerical 
solution of elliptic equations is to discretize the problem and use an iterative 
method for the solution of the arising nonlinear system of algebraic equations 
(see e.g. [13]). For the latter suitable preconditioning technique has to be used [3]. 
For instance, an efficient way of this is the Sobolev gradient technique [15], which 
relies on using the trace of the Sobolev inner product in the discrete spaces. This 
technique is a link towards an approach opposite to the above: namely, the it- 
eration can be executed on the continuous level directly in the corresponding 
Sobolev space, reducing the nonlinear problem to auxiliary linear BVPs. Then 
discretization may be used for these auxiliary problems. The theoretical back- 
ground of this approach is the generalization of the gradient method to Hilbert 
spaces (see e.g. [4,9,10,17], and for a class of non-uniformly monotone opera- 
tors [12]). Application to uniformly elliptic BVPs is summarized in [11]. 

The aim of this paper is to construct a class of numerical methods for strongly 
nonlinear fourth order problems, based on the Sobolev space gradient method. 

* This research was supported by the Hungarian National Research Funds AMFK 
under Magyary Zoltan Scholarship and OTKA under grant no. F022228 
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This result extends the scope of [11] as wide as possible within the Hilbert space 
well-posedness of our problem. The actual numerical realization is established by 
the choice of a suitable numerical method for the solution of the auxiliary prob- 
lems. This approach can be regarded as infinite- dimensional preconditioning by 
the biharmonic operator, yielding two main advantages. Firstly, a favourable ra- 
tio of convergence is achieved for the iteration. Secondly, computational problems 
are reduced to those arising for the auxiliary linear problems for the biharmonic 
operator, since the nonlinearity is entirely handled by the outer simple GM iter- 
ation. The numerical solution of the former is much developed (see e.g. [5,7,16]). 
This paper focuses on constructing the Sobolev space GM and proving its linear 
convergence in the corresponding energy norm. A simple numerical example is 
given to illustrate the growth conditions involved in strong nonlinearity of the 
lower order terms. 

2 Formulation of the Dirichlet Problem 

The following notations will be used throughout the paper. For u G H^{S1) the 
Hessian is denoted as usual by D^u. For any H,V : ^ let 

N N 

H-V:= ^ H,kV,k, dW^H := ^ d,kH,k ■ 

i,k—l i,k—l 

We consider the boundary value problem 

J T{u) = div^A(a;, D'^u) — div /(x, Vu) -|- g(x, u) = g{x) 

\ UldO = duU\go = 0 
with the following conditions: 

(Cl) = 2 or 3, 12 C R'^ is a bounded domain, dH G C"* (cf. also the third 
remark) . 

(C2) A G C^(f2 X f G C^(72 x R^,R^), q G C^{J2 x R) and 

gGL^{G). 

(C3) There exist constants m' > m > 0 such that for any (x, 0) G G x 
the Jacobian array 

A'e(x,0) = 

(in RN^xN^ 

) is symmetric and its eigenvalues A satisfy 
m < A < m' . 

(C4) There exist constants k,/ 3 > 0, further, 2 < p (if N = 2) and 2 < p < 6 
(if N = 3) such that for any (x, rf) G f2 x R^ the Jacobian matrix 

(in R^xiv 

) is symmetric and its eigenvalues p satisfy 
0 < p < k + (3\p\p-‘^. 

(C5) For any (x,^) € 12 x R there holds 0 < d^q{x,^). 
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3 Sobolev Space Background 

We introduce the real Hilbert space 



Hl{f2) 


:= {u G F[‘^{f2) : u\ga = d^u\ga = 0} . 


(2) 


It is well-known that 


= [ D'^u ■ D'^v 


(3) 



n 



defines an inner product on equivalent to the one. 

Remark 1. (See [1].) Assumptions (Cl) and (C4) imply the following Sobolev 
embeddings. There exist constants K^o > 0 and Kp > 0 such that 

i?o'(C) c C(77), \\u\\^<KM\hS (4) 

\\u\\^.,,<Kp\\u\\hs (5) 

where ||m||^.j^i,p := (/^ Further, 

\\uU2^a)<X-^^^u\\H. {uGHlm (6) 

where A denotes the smallest eigenvalue of on n 



4 The Gradient Method 

4.1 Gradient Method for the Dirichlet Problem 

Proposition 1. The followinq equation defines an operator 
F : ^ Hl(n): 

{F{u),v)fj^= J [A{x, D'^u) ■ D^v + f{x,Vu) -Vv + q{x,u)v\ {u,v G Hq{Q)). 

a 

Proof. We use conditions (C3)-(C4) for the functions A and /. Lagrange’s 
inequality yields the following estimate for the right side integral (with suitable 
constants mg, ttiq, k' , k' , f3' , 7 > 0): 

J [{mo + 77i'|iA^u|)|iA^r;| + (/t' + /3 '|Vm|^“^)|Vz;| + |(7(x, m)z;|] < 

a 

{mo + m'\\u\\H2)\\v\\H2+(k' + (3'\\yu\\l~^'^ l!Vt!||iP +7 max |g(a:,u)| ||r'||oo- 

The Sobolev embeddings (4)- (5) yield that the norms in this estimate are fi- 
nite, and for fixed u G Fl‘^{fl) the discussed integral defines a bounded lin- 
ear functional on iJg(f?). Hence the Riesz theorem provides the existence of 
F{u) G Hl{Q). □ 
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Remark 2. Since the embeddings (4)-(5) are sharp, the growth conditions (C3)- 
(C5) are the strongest that allow Proposition 1 to hold in 

A weak solution u* G Hq{Q) of problem (1) is defined in the usual way by 

{F{u*),v)h2 = J gv {vGH^{n)). (7) 

a 

Now we formulate and prove our main result on the Sobolev space gradient 
method for problem (1). For this we introduce the operator Z\^ in the space 
L^(l 7 ) with domain D{A^) := n Hq{Q). Then Green’s formula yields 

J {A^u)v = J D^u ■ D^v = {u,v)h 2 (m, w g n 77 q (f?)), (8) 

o n 

hence the energy space of is Further, we will use the following 

notations: 



q{u) := max{q’j(a:, : a: G 17, |^| < u} {u > 0); (9) 

M{r) =m' + kK^ + PRPrP-^ + X~^q{K^r) (r > 0), (10) 

where K 2 , Kp are from (5), and A denotes the smallest eigenvalue (or lower 
bound) of A^ on n i7g (17). 

Theorem 1. Problem (1) has a unique weak solution u* G Hq([ 2). Further, 
let uq G F['^{fl) n F[q{Q), and 

Mo := M ^||uo||jy2 + — ^||T(uo) - gWh^ (11) 

with M{r) defined in (10). For n G N let 

2 

Un+l = Uji — j Zfi , (12) 

Mq + m 

where Zn G Fl'^{Sl) H Hq{S 1) is the solution of the auxiliary problem 

f A^Zn = T{Un) - g 

\ (13) 

Then the sequence (un) converges linearly to u* , namely, 

||u„ - l|T(«o) - 9\\l^ { e N) . (14) 

mV A \tVlo + m/ 
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Proof. A Hilbert space result, given in [12], will be applied in the real Hilbert 
space H := For this purpose first some properties of the operator T 

in H, defined on D{T) := n (17), and of the generalized differential 

operator F, are proved. 

(a) There holds D R{T) since the operator in is onto by reg- 
ularity. Namely, condition (Cl) implies that for any g G the weak solution 

of A^z = g with z\qq = dvZ\QQ = 0 is in D{A'^) = n Hq{Q) [2]. 

(b) Green’s formula and (8) yield that for any u,v G n 

{F{u),v)h 2 = j T{u)v = {A~'^T{u),v}h 2 ■ (15) 

n 

Hence ^ . 

(c) F has a bihemicontinuous Gateaux derivative F' such that for any u G 
Fl^{ff), the operator F'{u) is self-adjoint and satisfies 

m||h||^. < {F\u)Kh)Hi < M(||n||^.)||h||^, {h G i7^(G)) (16) 

with the increasing function M defined in (10). These properties can be checked 
by suitably modifying the proof of the corresponding result for uniformly ellip- 
tic problems [11] (quoted in the introduction), now using Sobolev embedding 
estimates. This works in the same way for the existence and bihemicontinuity 
of F' as for verifying (16), hence for brevity the former is left to the reader. The 
operators F'{u) are given by the formula 

{F'{u)h,v) = J [A'q{x, D'^u)D‘^h- D^v + f^{x,\7u)\7h-\/v + q'^{x,u)hv] (17) 
o 

(for any u,v,h G JlQ(f2)). Now we use conditions (C3)-(C5). The symmetry 
assumptions on A'^ and /' imply that F'(u) is self-adjoint. Further, there holds 

m J \D^h\'^ < {F\u)h,h)H 2 < J [m'\D'^h\'^ + {k + l3\\7u\P-^) \Vh\^ + q{u)h‘^] 
n n 




using Remark 1. Thus (16) is verified. 

The obtained properties (a)-(c) of T, F and the auxiliary operator B := Z\^ 
yield that the conditions of Theorem 3 and Corollary 2 in [12] are satisfied 
in the space FI = L^(G). Hence equation T{u) = g has a unique weak so- 
lution u* G Hb = Hq{[2), and for any uq G D{B) the sequence Un+i = 
Un — jj^p^B~^{T{un) — g) converges to u* according to the estimate (14). 
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Remark 3. (13) can be written in the weak form 




(T(m„) - g)v 



{v G 



(18) 



This is also valid when 17 violates condition dfi G in (Cl) (i.e. TT'-regularity 
is not guaranteed for the solutions of (13)). Moreover, numerical realization 
essentially relies on (18), and the aim of the strong form (13) in the theoretical 
iteration is rather to indicate clearly the preconditioning role of . 



4.2 Generalizations 

(a) We may set a weight function w G L°°{S1) in (18), which means precondi- 
tioning formally by the operator Bz = div^ {w z) . For instance, a piecewise 
constant w may give more accurate approximation of the bounds of T. 

(b) The GM in Theorem 2 works similarly, involving the weak formulation, 
for mixed boundary conditions 

u\aa = (a(a:) A{x, D‘^u)v ■ v + 7 (x)5j.m) = 0 , (19) 

where a ,7 G C(9l7), a, 7 > 0, -I- 7 ^ > 0 a.e. on dfl. Defining B := Z\^ with 

the domain D{B) := {u G = {a{x)duU^ + 'y{x)duu)\gQ = 0}, and 

letting Tq, := {x G dfi : a(x) > 0}, the energy norm is now 

/ -{d^ufda {u & H‘^{fi), u\qq = d, d^u\QQ\r^ = d) . 

5 Numerical Example 

As referred to in the introduction, the GM in Theorem 2 presents a class of 
methods wherein actual numerical realization is established by the choice of a 
suitable numerical method for the solution of the linear auxiliary problems. For 
instance, the latter method may be a FDM or FEM discretization. (The FEM 
for 4th order linear equations is highly developed [5,7,16], making its coupling 
to the GM as promising as has already been achieved in this way for 2nd order 
uniformly elliptic problems [8,9].) An important special case might be the use 
of one fixed grid for each linear problem, providing a suitably preconditioned 
nonlinear FEM iteration (cf. the Sobolev gradient technique [15]). 

Here we consider the simplest case of realization to illustrate the theoretical 
convergence result: the auxiliary problems are solved exactly. (Besides simplicity, 
this actually realizes infinite-dimensional preconditioning.) The main purpose of 
the model problem is to give an example of the growth conditions involved in 
strong nonlinearity of the lower order terms. 

In the sequel we will use notations 

v'^ := and [x'=] := v'l + for v = {vi,V 2 ) G R^ fc G N+. 
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We consider the following semilinear model problem: 



T{u) = — div (Vm)^ + ue“^ = g{x\,X 2 ) 

'^\dn = = 0 



in 17 = [0, C 

^ ^ ( 20 ) 



g{xi,X 2 )= 3(2 _ 0.249 cos 2xi) (2 - 0.249 cos 20:2) ' 

Using the notations in (C1)-(C5) after (1), we now have A{x,0) = 0, f{x,g) = 
q{x,^) = , hence m = m' = 1, k = 0, /3 = 3, p = 4. The boundary 

condition (19) holds with a = 1, 7 = 0, hence Tq, = dSl. Therefore, defining 
B = with this BC, the energy space is 



Hb = {u € H^{f2) : u\ga = 0} with \\u\\b= [ \D^u\'^ . 

Jn 

Then (20) is the formal Euler equation of the potential J : Hb — *■ R, 

J{u) ■■= + \ • 

In order to apply Theorem 2, we need the values of the constants in (10). These 
can be estimated by suitable integration and Schwarz inequality, following [6,14] 
(for brevity the calculations are omitted), thus we obtain K 2 = 2“^/^, = 

6^/^, A = 4 and Kao = 1-2. The calculations are made up to accuracy 10“"^. We 
define the Fourier partial sum 

g{xi,X 2 ) = ^ Ofc; sin fcxi sin /x2 , o/cj = 2.3803 • 4“^^+*^ 

k,l are odd 
k + l<6 



which fulfils jj^ — g\\L‘^(o) < 0.0001. The solution of T{u) = g is denoted by u. 
Let uo = 0. Then (10) and (11) yield 



Mo = 3.6903, 



2 

Mo + m 



0.4262, 



Mq — m 
Mq + m 



0.5735. 



Now we are in the position to apply the GM iteration (12)-(13). The main idea 
of realization is the following. In each step is approximated by Tfc^(rt„), 

where Tk„{^) is the /c„-th Taylor polynomial of , chosen up to accuracy 10“"^ 

for If I < ||m||oo- Hence, by induction, the sequences (z„) and (it„) consist of sine- 
polynomials (preserving this from g and uq), and the auxiliary equations (13) 

S 

are elementary to solve. Namely, if h{x\,X 2 ) = X) at; sin fcxi sin /a:2, then the 

k,l=l 

solution of A'^z = h with z\qq = d'^z^gQ = 0 is given by 



S 

Z{X1,X2) = ^ 
k,l=l 



a-ki 

(fc2-b/2)2 



sin kxi sin 1 x 2 ■ 
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The algorithm has been performed in MATLAB. 

(The high-index almost zero coefficients were dropped within accuracy 10“^, 
and the error was calculated from the residual.) The following table contains the 
error e„ = ||u„ — u\\h^ versus the number of steps n. 



step n 


1 


2 


3 


4 


5 


6 


7 


error 


0.1173 


0.0556 


0.0275 


0.0158 


0.0118 


0.0065 


0.0037 




8 


9 


10 


11 


12 


13 


14 


15 


0.0021 


0.0014 


0.0010 


0.0008 


0.0005 


0.0003 


0.0002 


0.0001 
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Abstract. We study a semilinear parabolic partial differential equa- 
tion of second order in a bounded domain f? C with nonstandard 
boundary conditions (BCs) on a part Pnon of the boundary df2. Here, 
neither the solution nor the flux are prescribed pointwise. Instead, the 
total flux through /Won is given and the solution along /Uon has to follow 
a prescribed shape function, apart from an additive (unknown) space- 
constant a{t). 

Using the semidiscretization in time (so called Rothe’s method) we pro- 
vide a numerical scheme for the recovery of the unknown boundary data. 

Keywords: nonlocal boundary condition, parameter identification, semi- 
linear parabolic BVP 

2000 MSC: 35K20, 35B30, 65N40 

1 Introduction 

Let N G N, N > 2. We consider a bounded open domain 17 C with a 
Lipschitz continuous boundary df2 = Fuir U U Fnon- The index “Dir” 

stands for the Dirichlet part, “Neu” for the Neumann part of dfi, while “non” is 
the part of the boundary with a nonlocal type BC. The three parts F£Hj., F^eu 
and Fnon are supposed to be mutually disjoint. Moreover, we assume that Fnon 
is non negligible and that Fnon and Fjjir are not adjacent, i.e., 

I Fnon 1^0; Fnon f^FD^r = %. (1) 

We study the following semilinear parabolic partial differential equation of 
second order 

- V • (Ar(t,x)VM(t,x)) = /(t,x,u(t,x)) in(0,T)xf7. (2) 

* This work was supported by the VEO-project no. Oil VO 697. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 467—474, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



468 



Roger Van Keer and Marian Slodicka 



We consider nonstandard boundary conditions on -T„on of the type 



■L 



u{t, x) 

itr(t,x)Vu(t,x) • IV dy 



= gnon{t,yi) + a{t) 

= s{t), 



in (0,T) X Fnan 
in (0,T). 



(3) 



Here, the time dependent enforced total flux s{t) through Fnan is given and 
the solution along Fnon has to preserve the prescribed shape gnon, apart from 
an additive (unknown) time-depending degree of freedom a(t), which has to be 
determined as a part of the problem. 

There are standard pointwise boundary conditions on Fjjir and Fj^eu'- 



u(t,x) = gDir{t,x) in (0,T) X Foir 
-itT(t,x)Vu(t,x) ■ u - gnobit,x.)u{t,x) = gNeu{t,x) in (0,T) x Fn^u- 



(4) 



The initial condition is given as 

u(0,x) = Mo(x) G in 17. (5) 



We suppose that for the functions goir on (0, T) x F^ir and gnon on (0, T) x 
Fi\jeu there exists a prolongation g of these functions to the whole domain 17 
such that 

gGL2{(0,T)H\n)). ( 6 ) 

With respect to this assumption one can easily see that in (1) Fuir and Fnon 
had to be required to be non-adjacent. 

The right-hand side / is supposed to be globally Lipschitz continuous in all 
variables and the data functions gNeu, gRob,P, K obey 

0 < gRob < c, a.e. in (0, T) x F^eu 

ffAfeu G ^2 ((0, T), L2(/ATeu)) (7) 

0 < Co < K(t,x) < C a.e. in (0,T) x 17. 



This type of initial boundary value problems (IB VPs) arises in the determination 
of the magnetic properties of materials used in electric machinery. In practice, 
the original problems are highly nonlinear in that memory properties (hysteresis 
behaviour) of the material must be taken into account. The nonlocal BCs (3) 
considered in the IBVP (2)-(5) correspond to the situation when the average 
flux in the lamination is enforced, from which the magnetic held strength at the 
surface of the lamination must be derived. Such models have been studied e.g. 
by Van Keer, Dupre & Melkebeek in [6]. In that paper the authors suggested a 
modified finite element-finite difference scheme for the numerical approximation 
of the unknown u and a. The existence and uniqueness of the exact solution has 
not been discussed there. To deal with the nonlocal BC in a variational setting 
in [6] a space of trial and test functions has been considered with constant 
traces on Fnon- Therefore, the standard FE packages could not be used for the 
numerical computations. 

In this paper we prove the uniqueness of the solution to the IBVP (2)-(5) 
and provide a numerical method for the recovery of the unknown boundary data 
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a. First, we give the variational formulation of the problem (2)-(5). We apply 
Rothe’s method for the time discretization, see Kacur [2] or Rektorys [3]. We 
have to solve a recurrent system of elliptic BVPs at each successive time point ti 
of a suitable time partitioning. We apply the ideas from Slodicka & Van Keer [5] 
to obtain a weak solution Ui « u{ti) at each time step ti and to determine the 
unknown value at « 



2 Variational Formulation, Uniqueness 



We denote by the standard L 2 -scalar product of real or vector- valued 

functions w and z on a set M, i.e., = / wz. The corresponding norm 

Jm 

is denoted by ||rc||Qj^ = ^ (w, The first-order Sobolev space is 

equipped with the usual norm H-Hj^ 

Iklllr? = + (Vw, Vw)^ = lkllo,f2+ 

We define the following space V of test functions (in fact a Hilbert space) 

V = {(p e ip = 0 on Foir, P = const on Fnon}, 

which is endowed with the induced norm H-Hj^ ^ from 

The variational formulation of the IBVP (2)-(5) reads as follows: 

Problem 1. Find a couple (u,a) obeying 

1. ue C([0,T],T2(t2))nL2((0,T)iJi(f2)), 

2. ||GL2((0,r),T2(f2)), 

3. U = QDir on (0, T) X Fair, 

4. li Qnon — Q: G F 2 ((0; ^)) Oil Tnon? 

5. u(0) = uo in 

such that for all (/? G V and for almost alH G [0, T] holds 

= - i9Neu{t),p)r^^^ - 

Now, we prove the uniqueness of the solution to the IBVP 1. 

Theorem 1. Let (1), (5), (6) and (7) he satisfied and let f be globally Lipschitz 
eontinuous in all variables. Then the IBVP 1 admits at most one weak solution. 



Proof. Suppose that there exist two solutions (««,«) and {up, /3) to the IBVP I. 
Subtract the identity (8) for Ua from the corresponding identity for up, take 
P = {ua — up){t) G V and get 




^^,{u^-up){t)] 



+ {K{t)\/{Ua - Up){t),\/{Ua 



F {9Rob{t){Ua Up){t)i{Ua (^)) 

= {f{t, Ua{t)) - f{t, Up{t)), {Ua - Up){f))^ . 






dt 



470 



Roger Van Keer and Marian Slodicka 



We denote by C a generic positive constant. Integrating this equality over t G 
(0,s), for any s G (0,T), and taking into account the assumption (7) and the 
Lipschitz continuity of the right-hand side /, we arrive at 










dt<C 






From Gronwall’s lemma we conclude that Ua = up in the space C ([0, T] , L2{f2))0 
L 2 ((0, T)i7^(l7)) . For a and f3 we successively deduce that 



- P{t) \ = f \a{t) - /3(f) I = — ^ f \uc,(t) - up{t)\ 

PnonI J Fnan P non | J Tnon 

— ^ f Wolit) — Up{t)\ < C \\Ua{t) — Up{t)\\g QQ 

Jon 

< C \\Ua(t) — Up(t)\\-^^j^ , 



where in the last but one step we used the Cauchy-Schwarz inequality in L 2 {dfi) 
and in the last step we invoked the trace inequality. Integrating the inequality 
with respect to the time variable we get 




\a{t) - P{t)f dt<c[ \\ua{t) 

Jo 



up{t)\\lj2 



Recalling that Ua = up in the space L 2 ((0, T)iJ^(I?)), we conclude that a{t) = 
(3{t) a.e. in (0,T). □ 



3 Time Discretization 

We divide the time interval [0,T] into n S N equal subintervals where 

ti = IT, with the time step t = ^. We introduce the following notation for any 
abstract function z on [0,T]: 

Zi = z[ti), 0Zi= . 

T 

The application of the usual Rothe method to the IBVP 1 complicated by 
the nonlocal BC on Tnon- Here, we apply the ideas from Slodicka & Van Keer [5] 
for elliptic problems. We consider the following linear elliptic BVP with nonlocal 
BCs at each time point ti, i = 1, 2, . . .. 

Problem 2. Find a couple (ui,ai) G x R obeying 

1 . Ui — t]Diri ^Dir 

2. Ui Qnorii — OU Pnon 

such that 

{Pi5u^,(fi)^ -k (A",Vwi, Vv?)^ -k {9RobiU^,(fi)p^^^ 

~ Ui—l), p) Q ~ {gNeUi j p) r^eu ~ ■®®'P|Craon I 



ipGV. 



(9) 
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The existence of a weak solution (ui,ai) « a;), a(ti)) at each ti is shown 

below by invoking some arguments from [5]. 

Theorem 2. Suppose that (1), (6), (7) hold and assume thatug € L2(f2). Then, 
there exists a unique solution {ui, a^) to the BVP 2 for any i = 1, . . . ,n. 

Proof. We introduce a subspace Vb of F by 

Vq = {q}& ip = 0 on Puir U Pnon}- 

We define the bilinear form a \V x F ^ R by means of 

( Pi'll) \ 

+ {pRobiW, p) 

and the linear functional f on by 

{F,P) = - {gNeui,p)r^^^ + 

We consider two auxiliary problems at each time step ti. The first one takes into 
account the source term and the nonhomogeneous BCs, i.e.: 

Find Vi G H^{f2) obeying 

’^i ~ 9Diri on Poir-; — 9norii On Pnon: 1101 

a{vi,if) = {F,ip), \/ipGVo. 

In the second problem the right-hand side and the Dirichlet data on Pair are 
taken to be zero, while the trace of the solution has to take the constant value 
one throughout Pnon- 
Find Zi G obeying 



— 0 on PjOir: — 1 OU Pnon: 1 i 

a{zi,if) = 0, \/ipGVo. ^ 

The Lax-Milgram lemma implies the existence and uniqueness of weak solu- 
tions Vi and Zi, for alH = 1, . . . , n. Applying the principle of superposition, the 
function = Vi + UiZi, with ai G R., satisfies the BVP 



^oti — 9D%ri on PjOir: '^a.i — Pnorii p OLi on Pnon: 91 

a(Ua,,p) = {P:T}, yp GVq. 

Now, we have to find such an ai for which the total flux of through Pnon 
is just Si. To this end, similarly as in [5], we introduce so called “total flux 
functionals” on V 

{G{zi),(fi) = -a{zi,<p), 

(p{vi),(fi) = -a{v^, <p) -k {F, (p), (13) 

{G{Ua,),(p) = -a{Ua,:(fi) + {F: p) , 

representing the total flux of Vi,Zi and through Pnon- It follows that the 
constant ai must fulfill 

(G{Ua,): i) = {G{Vi), i) -k ai{G{Zi),i) = Si, 
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where 1 is any smooth function satisfying 



( 1 on r-non 
\0 on Foir- 



(14) 



Therefore, 



Sj - {G{vi),l) 



(15) 



Here, (G(zi), 1) yf 0. Otherwise we would get a contradiction with the trace of Zi 
on Fnon and the uniqueness of the solution to the BVP (see [4] ) 



V • i-K,Ww) + ^ = 0 

T 



n 



w = 0 on Fuir 

-KiVw - gRobiW = 0 on FNeu 

W = const on Fnon 



—KiVw ■ V = 0. 



□ 



It was shown that the couple {ui, ai) « (it(ti, x), a{ti)) can be constructed from 
the solution of two auxiliary BVPs with standard (local) BCs. In practice, the 
auxiliary elliptic problems must be solved numerically. 



4 Numerical Experiments 

In the previous section we described the numerical scheme for the time discretiza- 
tion. For the space discretization of the two elliptic auxiliary problems at each 
time step, we use a mixed nonconforming finite element method, where the usual 
nonconforming basis on a triangle has been enriched by a bubble (polynomial of 
third order vanishing at the boundary) . For details see Arnold and Brezzi [1] . 



4.1 Example 1 

The first example is a linear parabolic problem. Let f2 be the rectangular domain 
17 = (0,0.5) X (0,0.02). Its boundary is splitted into three parts: F^ir (right), 
Gncu (top and bottom) and Tnon (left part of df2). We consider the following 
IB VP: 

du, 

10“^-- Au = 0 in (0, 1) X 17 

u{t) = I0^sin(27Tt) in (0,1) x F^ir 
— Vm • iz = 0 in (0, 1) X T/veti 

u{t) = a{t) in (0, 1) x 

— J Vu{t) • iz dy = 27rcos(27rt) in (0, 1) x Fnon 

u(0) = 0 in 17. 
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We have used the time step r = 0.005 and a fixed uniform triangular mesh 
consisting of 200 triangles. In Figure 1, the function a{t) (i.e., the space-constant 
unknown value on Fnan) is plotted versus the function s{t). The loop in Figure 1 




Fig. 1. Example 1: The behaviour of a{t) (at the x-axes) versus s{t) = 
2tt / cos(27ts) ds (on the y-axes) for t S [0, 1] 

Jo 



is a consequence of the periodicity of the boundary conditions in the problem 
setting. Such curves can be obtained in the computation of the electromagnetic 
losses in a lamination of an electric machine, based upon the Maxwell equations. 
The domain il can be seen as the cross section of the yoke. The surface enclosed 
by an (a, s)-loop is a measure of the electromagnetic losses over one time period. 
In fact, the practical problem setting is nonlinear and should include also the 
hysteresis behaviour of the material (cf. [6]). 



4.2 Example 2 



The second example is a semilinear parabolic problem with a nonlinear right- 
hand side. We consider the unit circle as the domain 12. The boundary df2 is 
splitted into two halves by the x-axis. The top part is Frion and the bottom part 
is Fj^eu- We consider the following evolution IBVP for {u{t,x,y),a{t)): 




— Vu(f,x) 



du . •? 7 dv . 

— - Au = u- - Av 

u{t) = v{t) — -I- a{t) 

Vu • 1 / dq = — / Vu • u dy 

n ^ ^non 

u — u{t, x) = — Vu(t, x) • 1 / — v{t, x) 
rt(0) = u(0) 



in (0, 1) X I? 
in (0,1) X FjiQji 
in (0, 1) X FjiQii 

in (0, 1) X 
in 17, 



where 



v{t, X, y) = txcos{ny) + ysin(Trtx) -I- t^x^. 
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The exact solution is 

u{t,x,y) = v{t,x,y) 
a{t) = 

We have chosen the time step r = 0.005 and an unstructured mesh con- 
sisting of 11872 triangles. The evolution of the absolute errors for Ui and at 
subsequent time points in the time interval [0, 1] is depicted in Figure 2. 





L2(12)-error of Ui for i — 1, . . . ,n Error of at for i = 1, . . . , n 

Fig. 2. Example 2: Absolute errors 
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Abstract. We describe a generalization of the GMRES iterative method 
in which the residual vector is no longer minimized in the 2-norm but in 
a C-norm, where C is a symmetric positive definite matrix. The resulting 
iterative method call GGMRES is derived in detail and the minimizing 
property is proven. 



1 Introduction 

We are interested in iterative methods for solving systems of linear equations 
of the form Au = b, where A is a large sparse nonsingular matrix. When A 
is symmetric positive definite, conjugate-gradient- type methods are often used 
and are fairly well understood. On the other hand, when A is nonsymmetric, the 
choice of iterative method is much more difficult. 

We consider a method similar to the Generalized Minimum Residual (GM- 
RES) method that was introduced by Saad and Schultz [9] for solving a linear 
system where the coefficient matrix A is nonsymmetric. We describe a generaliza- 
tion of the GMRES method called “GGMRES.” In this procedure, the residual 
vector is minimized in a (7-norm rather than the 2-norm for some symmetric 
positive definite (SPD) matrix C. When C = I, the GGMRES method reduces 
to the GMRES method. If one is able to find a matrix C so that CA is symmet- 
ric and nonsingular, then the GGMRES method simplifies (an upper Hessenberg 
matrix reduces to a tridiagonal matrix) and short recurrence relations can be 
used in the GGMRES algorithm. Additional details can be found in [2] and [3] . 

First, one chooses an SPD matrix C and an initial approximation to the 
true solution, u = A~^b. Then starting with the initial residual vector = 
b — Au^A\ one generates a sequence of vectors which 

are mutually C-orthogonal. The iterates . . . , are chosen so that 

for each n, the error vector is a linear combination of the vec- 
tors . . . , and so that the C-norm of the n-th residual vec- 
tor = b — is minimized. This can be done in a stable manner by the 

use of Givens rotations applied to a related linear system, which involves an up- 
per Hessenberg matrix. If C = J, the procedure reduces to the standard GMRES 
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method. If CA is symmetric, then the vectors can be determined by 

short recurrence relations. 

2 Generalized GMRES (GGMRES) Method 

First, we choose an SPD matrix C and we generate a C-orthonormal basis 
for the Krylov space A) by the simplified Arnoldi procedure 

as following with arbitrary. 

■u;(o) = = b — 

(To = ||■ul^°^||c 



/cto 




for i = 


l,2,...,n- 


1 


for 


j = 0,l,... 


,i - 1 




bij = 




end for 






) = 




CTi = 








) = 





end for 



Here the C-norm of a vector x is given by ||a;||^ = {x,x)c = x'^Cx using the C- 
inner product {x,y)c = {x,Cy) = x'^Cy. 

The above procedure is called Phase I and the following two basic relations 
are obtained 



AlVn-i = WnH^ 

VFjC'W'„ = J, 



where 



W, = [ 






w(^) . . . 








^10 


&20 


^30 




^nO 






CTl 


&21 


^31 








Hn = 


0 














0 








^n,n— 2 






0 
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^n—1 


^n,n— 1 






0 
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0 




(n+l)xn 


Next in Phase II, we obtain 
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••• c^, 
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( 4 ) 
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We now show how is determined. Evidently, we have 

r(") = b - 

= - AWn-ic(^^ 

= 

= (5) 



using Equation (1) and letting 

= Wr,q 

g = = cTo[l 0 0 ••• G 

o-Q = lk^°^||c- 



Then we assume that we can applied n— 1 Givens rotations Qi, Q 2 , ■ ■ ■ , Qn-i 
to the Hessenberg matrix iJ„ 



«i X X X . . . X 

0 «2 ’ ■ ' ■ ' ■ • 



Qn—l ' ' ' Q2Ql^n — 



X 



X 



0 

0 



0 



X 

^n,n— 1 






(n+l)xn . 



Here bn,n-i is the modified entry in this position of the matrix as a result of 
applying the Givens rotation matrices Qi, Q 2 , ■ ■ ■ , Qn-i- Next we use the n-th 
Givens rotation matrix 



1 



1 



Qn — 



1 



Cn 

Sn 




(n+l)x(ra+l) . 



(6) 



where c„ = s„ = -(Tn/a„, cr„ = and a„ = + cr^]^- 

Since cr„ = k Oj then a„ k 0. 

Using Equation (5), we consider 






If Q = QnQn-i ■ ■ ■ Q 2 Q 1 is the product of n Givens rotations, we let 



Rn — QHfi , 
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where i?„ is an (n + 1) x n upper triangular matrix with a zero last row 



Rn = 



r\i ri 2 ri3 

0 T22 T23 

0 



• 0 0 



Letting 

Qq = z= [zo zi 
we then solve the first n equations of 






rir. 



Tn-l-. 

T^n.r}. 



0 0 0 



(n+l)xn . 
^n—1 ^n] ; 



for Moreover, it follows that if is the last element of the vector Qq, 
then Zn = (T 0 S 1 S 2 . . . s„. Thus, we obtain ||7’^"^||c for each iteration without 
having to compute the inner product directly. Finally, we compute by 

y(n) ^ y(0) . 



We now proof the following theorem. 

Theorem 1. Using the GGMRES method, the approximate solution for the 
exaet solution u of the linear systems Au = b minimizes the C-norm of the nth 
residual vector = b — Au^"''^ ; namely, 



min 

i^(n)_^(o) ,A) 



2 

C • 



Proof. Using Equations (5) and (2), we have 

||r(")||^ = k")) = {CWniq - VF„(g - 

= {q- H,,c^--YW^CWn{q - HYY 

= Ik 111, 



Thus, we have 



mm 



»|k,= inin |k-if„c(-)||l. 



Since Hn can be factored as where the unitary matrix Q is a product of 

several Givens rotations and is a (n + 1) x n upper triangular matrix, then 
we have 



|k-iF„c(")||l= ||Q^(Q<j-i?„c("))ll2 

= {Qq - RncY^QQ^iQq - RncY 
= ||Q(Z-i?nc(")||l, 
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using Q^Q = I. Since Qq = z = [zq zi • • • Zn-i Zn]'^ , we have 



min 



Q= min \\RnC^^'> - z 

c(^)£Rri 



2 

2 • 



Since the last row of the matrix is zero, it can be written in the form 

TD 

[o ••• 0 ■ 

We can write — z\\q as 

WRnC^'’^^ - [zo zi ••• 2„_i]^||2 + k„P . 

If we choose such that RnC^^'^ = [zo zi ■ ■ ■ Zn-iY , then |k^"kc ^)e 
minimized and Ik^^^llc = \zn\- Thus, + Vlk-ic^”^ is the approximate 

solution to the exact solution u of Au = b, which minimizes || 

Example 1 fCase n = 3). We now illustrate the algorithm for a small system. 
Using Equations (1), (3), and (4), we have 



AW 2 = W 3 R 3 

rc(2)] = [rc(o) w(^^jH3 , 

where 
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To get the least squares solution of 
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CO 
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1 




1^2 J 




0 



we apply two Givens rotations to both sides and obtain 

= -2 



rii X 
0 r-22 


X 

X 




Cq 

(3) 




1 

0 <-H 

b) 

1 


0 0 
0 0 


^33 

0 




Ci 

1*^2 J 




■22 

.^3_ 



(3) (3) (3) 

We then solve the first three equations for Cg ' ,c\ , and C 2 ■ Finally, we have 
y(3) _ y(0) _|_ (.G)^(O) _|_ _ 



480 



David R. Kincaid et al. 



Example 2 (Case n = 3). We next show that zs = U0S1S2S3. Clearly from Equa- 
tion (6), we have 



Q — Q3Q2Q1 



'1 




'1 




1 

1 

1 


1 




C2 -S2 




Si Cl 


C3 -S3 




S2 C2 




1 


S3 C3 




1 




1 



It follows that the (3, l)-element of Q is S3S2S1 and we have z = 1T0S1S2S3 since 
z = Qq = Qcroe^'*’^). 



Notes and Comments. If we let C = I, we have the standard GMRES method 
(Saad and Schultz [9]). Moreover, if CA is symmetric and nonsingular, then 
the C-orthogonal procedure truncates and the upper Hessenberg matrix Hn 
reduces to a tridiagonal matrix. 

Additional details on the GGMRES method can be found in the disserta- 
tion of Chen [2] and the paper by Chen-Kincaid-Young [3]. Chen presents a 
wide range of numerical examples that illustrate the numerical behavior of vari- 
ous GMRES-type iterative methods and compares their rates of convergence to 
several other well-known iterative methods. 
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Abstract. This paper is concerned with the pure displacement prob- 
lem of planar linear elasticity. Our interest is focussed to a locking-free 
FEM approximation of the problem in the case when the material is al- 
most ineompressible. The approximation space is constructed using the 
Crouzeix-Raviart linear finite elements. Choosing a proper hierarchical 
basis of this space we define an optimal order algebraic multilevel (AMLI) 
preconditioner for the related stiffness matrix. Local spectral analysis is 
applied to find the scaling parameter of the preconditioner as well as to 
estimate the related constants in the strengthened C.B.S. inequality. A 
set of numerical tests which illustrate the accuracy of the FEM solution, 
and the convergence rate of the AMLI PCG method is presented. 

Keywords: PCG, multilevel preconditioners, non-conforming FEM 

AMS Subject Classifications: 65F10, 65N30 



1 Introduction 

In this paper we consider the parameter dependent planar linear elasticity prob- 
lem for almost incompressible material. It is known [9,5,7], that when the Pois- 
son ratio u tends to 0.5, the so called locking phenomenon appears, if low order 
conforming finite elements were used in the construction of the approximation 
space. Following [9,8], we use the Crouzeix-Raviart linear finite elements to get 
a locking-free FEM solution of the problem. Note that the straightforward FEM 
discretization works well for the pure displacement problem only [7]. The next 
important step is the construction of a locking-free solution method for the ob- 
tained linear algebraic system. Let us note, that the condition number of the 
FEM stiffness matrix tends to infinity when ly 0.5. This means, that if, e.g., 
the preconditioned conjugate gradient (PCG) method is used as an iterative 
solver for the algebraic problem, then the relative condition number of the can- 
didates for good preconditioners should be uniformly bounded with respect to 
the Poisson ratio. 

An optimal order full multigrid algorithm for the pure displacement problem is 
presented in [6]. More recently, robust multigrid preconditioning for the prob- 
lem in primal variables, obtained by the selective and reduced integration (SRI) 
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method, was proposed and studied in [12]. The SRI method is a particular case 
of a more general mixed formulation which produces a locking-free FEM dis- 
cretization (see [9], and also [3] for some more recent results about the efficient 
solution of the related saddle point problems). 

Here we study an efficient application of the general framework of the algebraic 
multilevel iteration (AMLI) method as introduced by Axelsson and Vassilevski 
(see, e.g. [4,13]). The presented new results complete the last years investiga- 
tions of the authors devoted to the development of robust preconditioners for 
the algebraic problem under consideration. A detailed study of the related two- 
level method was recently published in [11]. The corresponding hierarchical basis 
multilevel algorithm was first announced in [10]. 

The remainder of the paper is organized as follows. A short introduction to the 
Lame model of elasticity and the locking effect is given in the next section. The 
construction of the two-level and the AMLI preconditioners is described in §3. 
A model analysis of these preconditioners (obtained for a uniform mesh of half 
squares) is presented in §4. The last section contains numerical tests, which il- 
lustrate both the locking-free approximation properties of the Crouzeix-Raviart 
linear finite elements for the pure displacement problem, and the optimal con- 
vergence rate of the AMLI algorithm. Brief concluding remarks are also given 
at the end of the paper. 

2 Lame Model of Elasticity and the Locking Effect 

Several problems in computational mechanics can be written in the form 



where A is a given well conditioned operator, B is an operator with a non-trivial 
null-space, and e > 0 is a small parameter. If we study the behavior of the 
solution when e — > 0, we call (1) a parameter dependent problem [1]. 

Let us consider an elastic body occupying a bounded domain f2 C TZ^. We are 
interested in finding the vector field of the displacements u \ Q ^ Tif , when the 
field of volume forces / : 17 i— > Ti? is given. Suppose that / G [7^2(17)]^, and let 
V = . Then, the weak formulation of the pure displacement problem 

with homogeneous boundary conditions u\dQ= 0 reads (see [7] for more details) 




( 1 ) 



find uGV a(u, v) = F(v) Vu G V, 



( 2 ) 



where 




and 




Here T^(u) = (Vu -h (Vu)^)/2, and A > 0 and p > 0 stand for the Lame 
coefficients, which can be expressed by the elasticity modulus E > 0 and the 
Poisson ratio v G [0, 1/2) as follows: A = (if zz)/[(l-|- ;/)(! — 2z/)], p = E /\2{\-\-v)\. 
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The case v 0.5 it called almost incompressible, and so (2) belongs to the class 
of parameter dependent problems with e = (1 — 2v). Now suppose that 17 is a 
polygon, 7^ is a regular family of triangulations of 17, and FEM is used to get 
a numerical solution of the considered elasticity problem. As it was mentioned 
before, the locking effect appears for low order conforming FEM discretization 
of (2). This means that the relative error of the FEM solution is unbounded 
when zz — !■ 1/2 for any fixed mesh parameter h — *■ 0 (see [8] for more details). 
Fortunately, it turns out that locking can be overcome if the non-conforming 
Crouzeix-Raviart finite elements are used. Let N{Th) be the midpoints of the 
sides of the triangles T € 7/. First we define the scalar FEM space: 

'^cr,h = • ^\t is linear; v is continuous in M{Th), n = 0 in M{Th) n 917}. 

Now we introduce Vn = Note that Vh [^(17)]^ ^ Vh 

that is this finite element space is non-conforming. 

Since a pure displacement problem is under consideration, the bilinear form 
a(-, •) is equivalent to the following one: 

affu,v)= [ [{X + n){\7u){\7v)+gi\7u:v\7]. (3) 

J n 

Note that the modification of the variational formulation based on the bilinear 
form (3) is of principal importance. The discrete version of «*(•,•) in the non- 
conforming case is defined by element wise splitting of the integral, i.e. 

<^h{u,v)=^ f [(A-f ^)(Vu|t)(V?;|t) + mVm|t : -cItV]. 

TsTh 

If Uh is the solution of the discrete problem 

find UhGVh- al{uh,Vh) = F{vh) 'ivh&Vh, 
then the following locking-free error estimate holds (see [8]): 

Theorem 1. There exists a constant C a (independent ofh,X,p) such that 

IIm - Uh\\h < Com h ||/||[L2(j7)]2, 

where || • ||zi := \/ a^(-, •)> ^ ^^6 smallest angle in the triangulation. 

The standard computational procedure leads to a linear system of equations 
Ah Uh = ih where Ah is the corresponding stiffness matrix. At this point we 
run into a discrete locking phenomenon since the condition number K,{Ah) — *■ oo 
as V — !■ 1/2. Our next step is the construction of a locking-free preconditioner 
M. for Ah, such that K{M~^Ah) = 0(1) uniformly on v. 
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Fig. 1. C.-R. FE: (a) triangle e G Th\ (b) refined macro-element E 



3 From Two-Level to AMLI Preconditioner 



This section begins with a short presentation of the construction of the two-level 
algorithm as it was introduced and studied in [11]. Let 7h he a, refinement for 
a coarser triangulation Tff. Associated with 7/j are the FEM space Vh and the 
corresponding nodal basis element stiffness matrix Ah ■ Observe that Vh and Vh 
are not nested as in the conforming case. 

Let Ae be the element stiffness matrix corresponding to the triangle e G Th, 
and Ae be the macro-element stiffness matrix where the macro-element E G Th 
is obtained by a regular bisection refinement of e G Th (see Fig. 1). Follow- 
ing the FEM assembling procedure we have Ah = assembl{Ae}eeTH j = 

assembl{AE}EeTh- Let us denote by (j)^ = ^Le macro-element 

vector of the nodal basis functions. In all local matrices the numbering of the 
nodes corresponds to Fig. 1. Now, we are ready to define locally the hierarchical 
two-level basis 4>e- 



4>e = Je4>e, 




/ 1-1 

1 1 

V 



1 -1 
1 1 



\ 

1 -1 

1 1 / 



( 4 ) 



Here, and in what follows: I stands for the identity matrix of the appropriate size; 
all matrices and vectors related to the two-level basis are marked by tilde. The 
global two-level stiffness matrix reads: Ah = assembl{AE} Eer^^ where Ae = 
JeAeJe- TLe global transformation matrix J is also assembled by the local 
matrices Je- We now split and factorize Ah into 2x2 block form 



/ All-h Al2-h\ _ ( A\l-h ® ^ ^ ^ \ 

yA21;h A22;h j yA21;h Sh j ^0 I j 



( 5 ) 



Here: the block An corresponds to the interior nodal unknowns with respect to 
the macro-elements E G Th; and Bh stands for the Schur complement of this 
elimination step. Note that An-h is a block-diagonal matrix with blocks which 
are 6x6 matrices, i.e. this elimination can be performed macro-element by 
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macro-element. The next step of the construction is to approximate the matrix 
Bh written again in 2 x 2 block form 



( Bji,h Bj2,h\ ^ ( Bji,h 0 \ ( I B~lj^Bi2-h\ 

\B21-,hB22-,h) \B21-,hSh)\0 I J’ 



( 6 ) 



where the first pivot block Bn;h corresponds to the two-level basis functions 
which are defined as half-differences of nodal basis functions (see (4)), and Sh 
is the current Schur complement. It is important to note that wtB 22 ;h has the 
same sparsity pattern as the true discretization matrix Ah corresponding to the 
coarse triangulation Th- 



Definition 1 

where 



The two-level preconditioner is defined as M 2 L 



■M 2 L 



( A\i-h ^ \ ^ \ 



J-^M2lJ-^, 



\B2l;h B22;h J \0 I 

and where: Vn-h = ujB[^.i^; B['^.f^ stands for the diagonal part of Bu-h,’ to > 0 is 
a parameter. 



Let w be chosen so that v'^Bn-h'^ < v^T>ii;/jV < S v'^Bn-h^, then the fol- 
lowing estimate is a straightforward conclusion from the general result for the 
convergence of the two-level algorithms [2,13,11]: 







(7) 



Here 7 is the constant in the strengthened C.B.S. inequality corresponding to 
the 2x2 block-presentation (6) of Bh- 

Now, let us assume that the same uniform refinement procedure is used to con- 
struct a sequence of nested triangulations 71 C 72 C . . . C 7^. The final goal of 
this paper is to construct AMLI preconditioner JAamli for the stiffness matrix 
corresponding to 7^, and to study its convergence behavior. 

Definition 2 The AMLI preconditioner is determined recursively as follows: 



M 



( 1 ) 

AMLI — 






for k = 2,3, ■ ■ ■ , ^ 



— J 



-1 



Ax(^') 7 



-T 



where 



M 



(k) 

AMLI 



' 0 



I J[A) ^ 



(8) 
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Mf 




where 

and where: = ujB[^ , b[^ is the diagonal part of b[^ ; 

pp is properly scaled polynomial of degree f3. 

Following the general scheme of the convergence analysis from [4] one can prove 
that: 



Lemma 1. The AMLI method (8) is of optimal computational complexity if 



</3<4. 



4 Model Convergence Analysis 

The model problem is defined on a convex polygon 17 = 1J{T : T € Tj/} under 
the additional assumption, that T\ is obtained by diagonal bisection of square 
cells of a given uniform rectangle mesh, with mesh lines which are parallel to the 
coordinate axes. The considerations in this section are aimed to a quantitative 
analysis of the behavior of the constant in the strengthened C.B.S. inequality. 
The next two- level estimate was recently published in [11]. 

Theorem 2. The two-level constant in the strengthened C.B.S. inequality (for 
the model problem under consideration) satisfies the estimate 

1<1E< = 0.822 . . . Vi/G [0,1/2). (9) 

The general approach for such estimates is based on a local analysis on a macro- 
element level. For the AMLI algorithm, this estimate is valid only at the first 
factorization step. Unlike the case when AMLI is applied to conforming linear 
finite elements, here the coarse grid element stiffness matrices are changed at 
each factorization step. The behavior of 7 when k and v are varied is presented 



Table 1. Numerical analysis of the C.B.S. constant 7 



k 


V = 0.3000 


V = 0.4000 


1 / = 0.4900 


V = 0.4990 


1 / = 0.4999 


1 


0.786065 


0.801174 


0.820122 


0.822405 


0.822638 


2 


0.631551 


0.725086 


0.937080 


0.993267 


0.999322 


3 


0.277714 


0.343513 


0.655475 


0.940351 


0.994058 


4 


0.224511 


0.239849 


0.444969 


0.632548 


0.661470 


5 


0.165863 


0.187030 


0.291319 


0.318613 


0.338222 


6 


0.094187 


0.092353 


0.140757 


0.193588 


0.146164 


7 


0.085456 


0.078540 


0.083019 


0.102814 


0.126900 
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Table 2. Relative error stability for zz — > 1/2 



V 


ll^i- wziI!iz,,i2/II/II[l,7 


V 


l|w - Wz>ll7,l2/ll/ll[z.5i2 


0.4 

0.49 

0.499 


.3108249106503572 

.3695943747405575 

.3764879643773666 


0.4999 

0.49999 

0.499999 


.3771889077038727 

.3772591195613628 

.3772661419401481 



Table 3. AMLI preconditioning of non-conforming FEM system (/3 = 2) 



1 


N 


zz = 0.3 


o 

II 


1/ = 0.49 


1/ = 0.499 


1/ = 0.4999 


zz = 0.49999 


zz = 0.499999 


4 


1472 


13 


13 


12 


13 


13 


13 


13 


5 


6016 


12 


12 


12 


14 


13 


13 


13 


6 


24320 


12 


12 


12 


12 


13 


13 


13 


7 


97792 


11 


11 


11 


12 


13 


13 


13 


8 


196096 


11 


11 


11 


12 


12 


13 


13 



in next Table. The conclusion from the test data presented in the table is, that: 
a) 7 strongly decreases with k for moderate values of zz; b) there is an oscillation 
of 7 when zz is near the incompressihle limit followed again by a stable decreasing. 



5 Numerical Tests 

The numerical tests presented in this section are to illustrate the behavior of the 
FEM error as well as the optimal convergence rate of the AMLI algorithm when 
the size of the discrete problem is varied and zz e [0, 1/2) tends to the incom- 
pressible limit. The simplest test problem in the unit square fi = (0,1)^ with 
if = 1 is considered. The right hand side corresponds to a given exact solution 
u{x,y) = ( sin(7ra:) sin(7ry), ?/(?/ — l)a;(a: — 1)). The relative stopping criterion 
, r^‘*)/(A4~^r°, r°) < is used for the PCG algorithm, where r* 
stands for the residual at the i-th iteration step. 

The relative FEM errors, given in Table 2, well illustrate the locking-free ap- 
proximation. Here i = 4, N = 1472, and e: = 10“®. This Table is presented here 
for completeness. It was first published in [11]. 

In Table 3, the number of iterations are presented as a measure of the robust- 
ness of the proposed two-level preconditioner. Here e = 10“^. The optimal order 
locking-free convergence rate of the AMLI algorithm is well expressed. 

In the next table a modification of the AMLI algorithm (8) is used where two 
PCG inner iteration have been applied to stabilize the multilevel algorithm in- 
stead of the acceleration polynomial [/ — P/3(-)j- This approach was shown to 
be even a better candidate in the case under consideration when the constant 
in the strengthened C.B.S. inequality is varying during the AMLI factorization 
procedure. 

Remark 1. It is important to note once again that the application of the al- 
gorithm presented in this article is strictly restricted to the case of pure dis- 
placement. The more general case where Neumann boundary conditions are also 
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Table 4. Modified AMLI preconditioning (fixed PCG iterations) 





N 


CO 

O 

II 


o 

II 


V = 0.49 


V = 0.499 


V = 0.4999 


V = 0.49999 


V = 0.499999 


4 


1472 


10 


10 


11 


10 


10 


10 


10 


5 


6016 


10 


11 


11 


11 


11 


11 


11 


6 


24320 


10 


11 


11 


11 


11 


11 


11 


7 


97792 


11 


11 


11 


11 


11 


11 


12 


8 


196096 


10 


11 


11 


11 


11 


11 


12 



assumed require a modification of the variational formulation. Otherwise the 
second Korn’s inequality does not hold in the case of low-order non-conforming 
FEM discretization (see [9] for more details for the 2D case). 
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Abstract. Computationally efficient and numerically stable methods 
for solving Seemingly Unrelated Regression Equations (SURE) models 
are proposed. The iterative feasible generalized least squares estimator 
of SURE models where the regression equations have common exogenous 
variables is derived. At each iteration an estimator of the SURE model is 
obtained from the solution of a generalized linear least squares problem. 
The proposed methods, which have as a basic tool the generalized QR 
decomposition, are also found to be efficient in the general case where 
the number of linear independent regressors is smaller than the number 
of observations. 



1 Introduction 

The basic computational formulae for deriving the estimators of Seemingly Un- 
related Regression Equations (SURE) models involve Kronecker products and 
direct sums of matrices that make the solution of the models computationally 
expensive even for modest sized models. Therefore the derivation of numerically 
stable and computationally efficient methods is of great importance [3,6,17,18]. 
The SURE model is given by 

Vi — ^iPi ^ — 1?2,...,G, (1) 

where yi G 3?^ is the endogenous vector, Xi G is the exogenous matrix 

with full column rank, G 3?^* are the coefficients and Ui G 3?^ is the distur- 
bance vector, having zero mean and variance-covariance matrix an It- Further- 
more, the covariance matrix of Ui and Uj is given by aijiT, i-C. contemporaneous 
disturbances are correlated. 

In the compact form the SURE model can be written as 



/ yi\ 

V2 


— 


(Xi 

X2 


\ 


(PA 

P2 


+ 


AA 

U2 


\yG/ 




\ 


Xg) 


UJ 




\ugJ 
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or 



vec(F) = vec({/3jG) + vec(C/), (3) 

where Y = (yi, . . . , yc), U = (mi, . . . , uq), the direct sum of matrices is 

equivalent to the block diagonal matrix diag(Xi, . . . , Xg), {AIg denotes the 
set of vectors and vec(-) is the vector operator which stacks one 

column under the other of its matrix or set of vectors argument. The disturbance 
term vec(C/) has zero mean and dispersion matrix S 0 It, where, S = [<iij] 
is symmetric and positive semidefinite and 0 denotes the Kronecker product 
operator [2,4,15]. That is. 



S It 



/ (JiiIt <Ji2It ■■■ 0-1gIt\ 
CT2i/t <J22It ■ • • CT2gIt 



\aGllT CTC2 It • ■ • CTcgItJ 



For notational convenience the subscript G in the set operator {•} is dropped 
and 0^1 is abbreviated to ©j. 

The Best Linear Unbiased Estimator (BLUE) of vec({/3i}) is obtained from 
the solution of the General Least Squares (GLS) problem 

argmin ||vec(U) - vec({X,/3i})||^_i^^^ (4) 

/ 3 i ,...,/ 3 g 

which is given by 

vec({/3,}) = ((©,Xf)(U-i 0/T)(©,Xi))”' {(BiXl)vec{YS-^). (5) 



Often S is unknown and an iterative procedure is used to obtain the Fea- 
sible GLS (FGLS) estimator. Initially, the regression equations of the SURE 
model are assumed to be unrelated, that is, the correlation among contempora- 
neous disturbances of the model is ignored and S = Iq- This is equivalent to 
computing the Ordinary Least Squares (OLS) estimator of {/3j}. Then, from the 
residuals a new estimator for S is derived which is used in (5) to provide another 
estimator for the coefficients {Pi}. This process is repeated until convergence is 
achieved [16]. Generally, at the ith iteration the estimator of E is computed by 

(6) 



where [/(q = and = yj — Xj(3j (j = 1, . . . , G), are the residuals 

of the jth regression equation. 

The regression equations in a SURE model frequently have common exoge- 
nous variables (or common regressors). The purpose of this work is to propose 
computational efficient methods which exploit this possibility. 
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2 Numerical Solution of SURE Models 

The BLUE of the SURE model comes from the solution of the Generalized Linear 
Least Squares Problem (GLLSP) 

argmin||U||^ subject to vec(U) = (0iXi) vec({/3i}) + ^00(14(7^), (7) 

vAPi} 



where || • ||f denotes the Frobenius norm, S = CC^, the upper triangular 
C G has full rank and the random matrix V is defined as (GG>/T)vec(V) = 

vec(U); that is, VC^ = U, which implies that vec(V) has zero mean and variance- 
covariance matrix Itg [8,11,12,13]. Without loss of generality it has been as- 
sumed that E is non-singular. 

Gonsider the GQRD: 



and 



0P®,.V) = (®f-) 



{C ®It)P = 



K GT- K 

f Wn Wi2 \ ^ 

0 IU22 ) gt-k' 



(8a) 



(8b) 



where K = Ri G and W 22 are upper triangular, and Q,P G 

SjjGTxGT 

are orthogonal [1,14]. Using (8) the GLLSP (7) can be written as 



G 

argmin (||fii ||2 + ||?'i|| 2 ) subject to 

{vi},{%},{Pi} 



( vec({yd)\ 
Vvec({y*})y 



vec({/3J) 



fWii lUi 2 \ /vec({wj)\ 

\ 0 lU 22 y \vec({wj)y ’ 

(9) 



where 



vec(F) 



f vec({y*})\ 
Vvec({y*})y 



K 

GT- K 



and P^vec(U) 



/ vec({ui})\ K 
\^vec({ui }) ) GT - K ' 



From (9) it follows that vec({vi}) = W 22 "^ec{{yi\) and Vi = 0. Thus, the solution 
of the SURE model comes from solving the triangular system 



f vec({yd)^ ^ ( ®iPi ^i 2 \ ( vec({/3i})\ 
Vvec({y^})y V 0 1^227 V^ec({Ei})y ■ 



( 10 ) 



Notice that lUn is not used. Furthermore, for deriving the iterative FGLS, the 
RQD of (C ® It) in (8b) is the most costly operation as this needs to be 
recomputed for different C at each iteration. 
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The matrix Q in ( 8 ) is defined as 



(BiQi^ — 


(Qi 


Qi \ 




\ Qg 


Qg) 



where 






ki T - ki 

Qi ~ i^Qi Qi ) ’ 



is the QRD of Xi (i = 1 , . . . , G), 



Q^y^ = 



and QiVi = 



ki 

’ T - fc,- 



yViJ 'T -ki 

The computation of ( 8 b) derives in two stages. The first stage computes 

K GT -K 



Q'^{C®I)Q = 



Wii Wi2 ] K 



W21 W22 



\GT-K 



where Wij {i,j = 1 , 2 ) is block upper triangular. Furthermore the main block- 
diagonals of IT12 and IF21 are zero, and the ith {i = 1 , ... ,G) block of the main 
diagonal of Wn and W22 are given by Gulki and Culr-ki, respectively. The 
second stage computes the RQD 

(W2i W22) P={Q W22) ( 11 a) 

and 

(Wn W12) P = (Wn W12) . ( 11 b) 

Thus, in ( 8 b) P = QP. Sequential and parallel strategies for computing the 
RQD ( 11 ) have been described in [ 5 , 6 ]. Figure 1 illustrates the diagonally-based 
strategy, where G = 3 . At each step of this method a block-diagonal of W21 is an- 
nihilated by a series of simultaneous factorizations. Each factorization (denoted 

by an arc) annihilates a block and affects two block-columns of {w'^i ^ 

and {w\2 ^22) block triangular structure of IFn and IF22 is preserved 

throughout the annihilation process while IF12 becomes full except from its last 
diagonal block which remains zero. 
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Initial matrix Stage 1 Stage 2 Final matrix 




Fig. 1. Computing (11) using the diagonally-based method, where G = 3 



3 SURE Model with Common Regressors 

Consider now the SURE model with common regressors 

vec(F) = vec({/3i}c) + vec(C/), (12) 

where X‘^ denotes the matrix consisting of the K‘^ distinct regressors, K‘^ < 
K,S, G 3?^ is a selection matrix that comprises relevant columns of the K‘^ x 
K‘^ identity matrix and the exogenous matrix Xi {i = 1, . . . , G) is defined as Xi = 
X^Si [7,9]. Let the QR decomposition of X‘^ be given by 



= 



\ r 



r T — r 



0 T -r 



Qd ^ Qd Qd ^ ’ 



where rank(X'^) = r < T. Premultiplying (12) from the left by the orthogonal 
matrix Qd = (/g ®QclIg® Qd) gives 



vec(F) 

vec(F) 



vec({/3i}) 



vec(C/) 

vec(C/) 



where 



QlY = 



Y T-r 



and 



The covariance matrix of the disturbance term vec((C/ U)) is given by 

fS^Ir 0 \ 

I 0 S ® It-t ) 



(15) 
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Thus, the SURE model estimators {/3i} arise from the solution of the reduced 
sized model 

vec(?) = (®iR'^Si) vec({/3i}) + vec(U), (16) 

where the variance-covariance matrix of vec(t/) is given by S ® 

From (14) and (16) it follows that the estimator in (6) is equivalent 

to 

U(i+i) = [/(,) + y^r)/T, (17) 

where is the residual matrix of (16) at the tth iteration. Thus, the (upper 
triangular) Cholesky factor of denoted by C'(i_|_i), can be computed from 

the QLD 




0 \T-G 



(18) 



where and Qc € 3?^^^ is orthogonal. However, if the 

QLD of Y is given by 



QlY 



f Q\T-r- K<^ 
\Ly) 



(19) 



then C(i+i) in (18) can be derived from the updated QLD 



Qc 





( 20 ) 



Notice that if K‘^ > T — r, then Ly G in (19) is lower trapezoidal. Al- 

gorithm 1 summarizes the iterative procedure for computing the FGLS estimator 
of SURE models with common regressors. 

Consider now the case where there are no common regressors and T ^ K. 
That is, X‘^={Xi... Xg) G = K, 

QlX<^ = ^ , (21a) 



and 



ki k2 



ka 




R 



(b 

d 




ki 

K - 



(21b) 



where R^d upper triangular and in (16) RSi = r[^\ As in 

the case of SURE models with common regressors, the computational burden of 
deriving the iterative FGLS estimator can be reduced significantly if the original 
model is transformed to the smaller in size SURE model (16). 
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Algorithm 1 Iterative estimation of the SURE model with common regressors 



1: Compute the QRD Q^X‘^ = ^ and Q^Y = 

2: Compute the QLD QyY = ( P 

\Ly 

3: Compute the QRDs QjR‘^Si = and Qjyi = (i = 1, . . . , G) 

4: Let G(o) = Ig, vec({/?j°^}) = 0 and vec({f*}) = 0 

5: for i = 1,2,... do 
6: if i > 1 then 

7: Compute j (G(,_i) ® 7.) ©,Q^) = j 

8: Compute the RQD ^IF 2 i IU 22 ) R = (O IU 22 ) 

9: Compute {w n IU 12 ) P = (iFn IF 12 ) 

10: Solve the triangular system W 22 vec({uj}) = vec({y*}) 

11: Compute vec({w*}) = lUi 2 vec({wj}) 

12: end if 

13: Solve the triangular systems = (y* — v*) (j = 1, . . . , G) 

14: Compute the residuals ~ Vj ~ (j = 1, . . . , G) 

15: Compute the updated QLD Qc ~ (^G^^ ) ’ ~ (^ 1 '^ • • • 

16: end for until G(i) = G(i_i) and 




4 Conclusions 

A numerical and computational efficient method has been proposed to solve 
the SURE model with common regressors. The method is based on the GLLSP 
approach which does not require any matrix inversion and can derive the BLUE 
of the SURE model when S is singular [8,10]. The computation of the iterative 
FGLS estimator requires the solution of SURE models where the covariance 
matrix is re-estimated at each step. Thus, at each iteration step the QRD in (8b) 
for fixed Q and different C is computed. It has been shown how to transform the 
model to a smaller-in-size one. With this transformation both the computational 
cost and memory requirements for computing the QRD in (8b) are reduced 
significantly. Furthermore, this approach is found to be efficient also in the case 
where there are no common regressors and T ^ K. 

Gurrently the complexity analysis of the algorithm, parallel strategies for 
solving the GQRD (8) and the adaptation of these numerical methods to solve 
other linear econometric models are investigated. 
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Abstract. The aim of this work is to present a short review of appli- 
cation of the boundary collocation technique to some problems in fluid 
mechanics. The steps used to find interaction between a wall and a sphere 
moving axisymetrically towards the flat wall in micropolar fluid are out- 
lining to illustrate workability of the method. This problem occurs in 
modeling flow problems in microdevices as well as in human joins. 



1 Introduction 

Boundary methods are usually understood as numerical procedures which re- 
quire the use of trial functions satisfying the differential equation and which 
reduce the boundary conditions to an approximate form. There are two main 
possibilities to formulate boundary methods; one is based on the use of the 
boundary integral equations and the second one, to use a system of trial func- 
tions. Methods based on trial functions are typically known as instances of the 
method of weigh residuals (MWR). Here, the trial functions are used as the basic 
functions for the truncated series expansion of the solution. The choice of the 
trial function is one of the features, which distinguish MWR from finite element 
and finite difference methods. The boundary collocation method (BCM) belongs 
to the class of MWR and is the most primitive version of this method. Its main 
disadvantage is that it is applicable only to linear problems. A review of appli- 
cation of BCM in mechanics of continuous media to date can be found in review 
article by Kolodziej [10]. 

The Stokes equations in fluid mechanics are linear and often used to de- 
scribe creeping flows that appear in microhydrodynamics and biomechanics. 
These flows occur in colloids, suspension rheology, aerosols, microfabricated fluid 
systems (i.e. pumps, valves, microchannels, computer chips) and bio-flows. In 
the past few years several important advances have been made in the numeri- 
cal treatment of some Stokes flow problems by application of collocation tech- 
niques. For instance, solution of Stokes equations was presented by Skalak and 
co-workers [20] for several different flow problems involving an infinite array 
of identical particles. This method was popularized by the work of Gluckman 
et all [6] which solved flow past finite assemblages of particles of an arbitrary 
shape. They examined the flows past finite arrays of axisymmetric bodies such 
as spheres and spheroids, which conform to special natural coordinates systems. 
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Over the subsequent decade BCM has been used to solve a wide range of prob- 
lems. in Stokes flow. A few illustrative examples include an arbitrary convex 
body of revolution in [7], multiple spheres in a cylinder [17] and two spheroids 
in a uniform stream [18]. Ganatos and coworkers made major modification to 
the theory and extended it to handle variety of non-axisymmetric creeping flow 
problems with planar symmetry where the boundaries conform to more than 
a single orthogonal co-ordinate system. In [3] they studied quasi-steady steady 
time-dependent motion of three or more spheres settling under gravity in verti- 
cal planar configurations. Next the method has also been extended to bounded 
flow problems. In [4,5] motion of a sphere and spheroids was examined and the 
effect of the walls on hydrodynamic quantities on sphere motion for various flow 
geometry studied. Solutions for axisymmetric and three-dimensional motions of 
a torus in the presence of the walls were obtained in [11,12]. 

The conjunction of BCM with the boundary integral method permits to solve 
such problems as: a sphere in a circular orifice [1], hydrodynamic interaction of a 
three-dimensional finite cluster at arbitrarily sized spherical particles [9] - to cite 
a few. A fairly complete overview of the range of problems that has been suc- 
cessfully tackled by this method during the period 1978-1990 is available in the 
review article by Weinbaum et al. [21]. Conjunction of the collocation method 
with the perturbation method permitted to calculate resistance coefficient of 
a spherical particle moving in the presence of a deformed wall [15]. Algorithm 
proposed there can be applied to a wide class of bodies, shape of, which can 
be described in separable coordinates (ellipsoid, torus, spheroid, and sphere). 
All results cited above concern Newtonian Fluid. The first solution for Stokes 
flow past a sphere in bounded flow of non-Newtonian fluid, micropolar fluid, was 
derived by Kucaba-Pietal [14] (1999). 

In general, the BCM is very efficient tool for a class of Stokes flow involving 
interactions between particles of simple shape. A cardinal rule for the applica- 
tion the collocation technique in solving Stokes flow problem is that the velocity 
disturbance produced by each co-ordinate boundary may be represented by an 
ordered sequence of fundamental solutions appropriate to the constant orthogo- 
nal surface to be described. These fundamental solutions for the velocity held are 
known for rectangular, cylindrical, spherical [16] and spheroidal [8] co-ordinates. 
Fundamental solution for Stokes equation in toroidal coordinates was found by 
Kucaba-Pietal (1985) [12]. The coefficients that appear in the fundamental so- 
lutions have to be calculated from the boundary conditions. The series, which 
represent solution, can be truncated and the boundary conditions are not applied 
exactly to the whole body but only at some carefully chosen points ~ colloca- 
tion points. For more complicated regions (for example for bounded flows past 
a sphere) using the boundary conditions imposed on velocities along both con- 
fining walls we are able to invert analytically the Fourier-Bessel transform of 
fundamental solution which represent disturbance produced by the walls. In this 
manner, the original mixed co-ordinate, infinite domain boundary value prob- 
lem is reduced to much simpler finite domain problem in which only the two 
infinite arrays of unknown coefficients, which appeared in fundamental solution 
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described the moving body disturbance, need to be determined so as to satisfy 
the appropriate boundary conditions on the surface of the body. 

The difficulty in construction of a collocation technique is not in formulation, 
which is conceptually simple, but in the detailed development of the truncation. 
As was demonstrated in appendix to Gluckman et al, the numerical solution can 
oscillate and become unstable as the number of collocation points is increased if 
an inappropriate set of fundamental solution is used. 

The aim of this work is to illustrate the power of the boundary collocation 
technique by outlining the steps used to find interaction between a wall and a 
sphere moving axisymetrically towards the flat wall in micropolar fluid. This 
problem occurs in microdevices as well as in human joins. 



2 Formulation of the Problem 

Let us consider a quasi-steady flow field of an incompressible micropolar fluid [2] 
due to a translational axisymmetrical motion of a sphere S-U of a radius a 
towards the wall. Figure 1 shows the separation between the sphere and the wall 
is denoted d. 




Fig. 1. Geometry of the flow 
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In the polar coordinate system (r,0, z) with the origin in the center of the 
moving sphere the surface of a wall is described as z = - c; c = d+a. The 
translational velocity of the sphere S-a is (0, 0, U). The fluid at infinity is at 
rest. The flow is at low Reynolds number. Because of axisymmetric geometry of 
the flow, the stream function ^ (r, z) can be used. 

The equations of motion describing this flow are Stokes equations [14], and 
in terms of the stream function read: 

— k)L\\P + K,Li{ruj) = Q, ( 1 ) 

— ^Li{ruj) + kLi'I' — 2Kru) = 0 ( 2 ) 

In these equations w is the microrotation vector. Positive constants /i, k, 7 
characterize isotropic properties of the micropolar fluid. LI is the generalized 
axisymmetric Stokesian operator: 

\5R^ ~ RSR^ SZ^ 

After elimination of the microrotation vector u from equations (1), (2) we 
arrive at: 




Ll{Li - = 0, 

with the microrotation given by: 


(4) 


1 Li'f' + 7(/r + k) 

“ = r* >• 


(5) 


and constant A 2 defined as: 




2 k{2h + k) 

7(/r+ At) ■ 

The boundary conditions for W and w are on the sphere S-a: 


(6) 




(7) 


1 

uj = a\-rot V, 


(8) 


and on the wall: z = - c 




If = 0, 


(9) 


1 

CO = a2~rot V 


(10) 



where constants 02 , > 0. 
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3 Algorithm for Receiving the Flow Field 

The technique described below is based on the use of the fundamental solution of 
the Stokes equation and the application of the reciprocal Hankel transformation. 
As a consequence the solution can be expressed using a series in which the 
unknown constants appear. The algorithm for determining flow field can be 
summarized as follows: 

1. First , the stream function ^ is decomposed into two parts 

tZ/ = !f^+!f2 (11) 

a) !Fi is fundamental solution represents an infinite series containing all of 
the simply separable solutions of Eqs. (1-2) in the spherical coordinates. 
These solutions are regular in the flow field and given by the formula [14]: 

W, = ^(S„p-"+i + + A„4_i(pA))J„(C) (12) 



where 

— C = cos{0) 

— 4(C) is the Gegenbauer function of the first kind of order n and 
degree - 

— 4- 1 Bessel functions 

— (p, 9, C) are sperical coordinates measured from the center of the 
sphere. 

Bn-, An, and are unknow constants which will be determined from 
equation resulting from satisfying the non-slip boundary conditions on 
the surface of the sphere in the presence of the confining wall, 
b) 4 is fundamental solution of Eqs. (1-2) in terms of cylindrical coordi- 
nates and represents an integral of all of the separable solutions which 
produce the finite velocities everywhere in the flow field and is given by 
the Eourier-Bessel integral [14]: 



4 = y [B(a)e"“^ -k D{a)e-°‘^az + G(a)e"‘^^] Ji(a, r)r/, da (13) 
where 

— S = \/\^ + a^, 

— B (a) , A{a) , r (a) are unknown functions of the separation variable 
a 

— J\ denotes the Bessel function of the first kind of order unity. 

The disturbances produced by the sphere along the wall can be com- 
pletely reduced due to the solution of (13) obtained by the proper choice 
of functions B{a), A{a), B{a). 
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2. Second , we apply the boundary condition equations on the wall (9-10) after 
replacing 'I'l and 'I'2 by their series (12) and integral (13) representations 
respectively. As result we get equations which can be easily inverted and 
integration can be achieved by applying Hankel transforms. We are able 
now to express unknown functions B(a), D(a), G(a) by the series (12) and 
the original problem is reduced to the infinity domain of the flow. Thus the 
axial v_r and radial v_z velocities of the fluid flow and the microrotation lo 
can be rewritten in terms of the unknown functions: Bn, An, and Dn- 

3. Third, we truncate the infinite series, which appears in the formulas defining 
the velocity and the microrotation. In order to obtain a unique solution, the 
boundary conditions on the sphere (7-8) are applied at a finite number of 
discrete points on the sphere. Then we solve a derived linear set of equations 
by numerical method to And Bn, An, and Dn- At this stage the solution is 
known. 

4 Force 

Very useful for the considered problem is the expression for the force acting on 
the sphere moving axisymmetrically in micropolar fluid, in terms of the stream 
function derived by Ramkisson, Majumdar [19]. It reads: 



5 Numerical Results and Conclusions 

The algorithm was implemented in Fortran and run on PC with a 160 MHz 
Pentium processor. The scheme for spacing the collocation points on the surface 
of the sphere was based on the paper by Ganatos [5]. A unique feature of the 
approach was that the convergence and the accuracy of the solution could have 
been controlled simply by selecting the proper trial set of points on the surface 
of the sphere. To study this algorithm a series of calculations of the force (14) for 
various rheological parameters of the fluid and the non-dimensional distance dis 
between a sphere and the wall was performed. The parameter dis was defined 
dis = d/a. To investigate the influence of the wall we investigated wall corrector 
factor WCF as function of dis. The WGF was defined as the ratio of vale of 
calculated force to force acting on the sphere in unbounded flow. Some results 
are summarized in the Table 1. 

Results show similarity of behavior of force f acting on a translating 
sphere in micropolar and classical fluid, but for given values dis and a the force / 
increases with increase of the ratio K = k/ y. So influence of rheological proper- 
ties of the fluid on the force can be clearly observed. Summarizing, the following 
conclusions can be drawn: 




(14) 
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Table 1. Drag correction factor WCF for the sphere moving towards a wall in 
micropolar fluid 



WCF Dis 


k//t = 2 


Kj \JL — 1.5 


= 0.5 


= 0 (Newtonian) 


1 


20.0 


17.3 


13.34 


8.71 


1.5 


17.81 


12.72 


5.95 


2.32 


2 


9.33 


6.21 


3.73 


1.82 


3 


5.42 


4.32 


2.65 


1.43 


4 


4.85 


3.63 


1.82 


1.21 


5 


3.83 


2.72 


1.64 


1.01 


6 


3.51 


2.4 


1.43 


1.01 


7 


3.36 


2.23 


1.35 


1.001 


8 


2.75 


1.89 


1.33 


1.001 


9 


2.51 


1.85 


1.32 


1.001 


10 


2.49 


1.83 


1.32 


1.001 



— Results show that the force acting on the moving body depends on the rhe- 
ological properties of micropolar fluid and distance of the body from the wall. 

— The area of an active interaction between a body and a wall is the most 
important factor, which increases the drag. 
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Abstract. We consider linear constant coefficient differential-algebraic 
equations (DAEs) Ax'{t) -f Bx{t) = f{t) where A, B are square matri- 
ces and A is singular. If det(AA -|- B) with A € C is not identically zero, 
the system of DAEs is solvable and can be separated into two uncou- 
pled subsystems. One of them can be solved analytically and the other 
one is a system of ordinary differential equations (ODEs). We discretize 
the ODEs by boundary value methods (BVMs) and solve the linear sys- 
tem by using the generalized minimal residual (GMRES) method with 
Strang- type block-circulant preconditioners. It was shown that the pre- 
conditioners are nonsingular when the BVM is A„,^_,/-stable, and the 
eigenvalues of preconditioned matrices are clustered. Therefore, the num- 
ber of iterations for solving the preconditioned systems by the GMRES 
method is bounded by a constant that is independent of the discretiza- 
tion mesh. Numerical results are also given. 

Keywords: GMRES method, Strang-type block-circulant preconditioner, 
convergence rate, clustered spectrum, DAEs, ODEs, BVMs 

AMS(MOS) Subject Classifications: 65F10, 65N22, 65L05, 65F15, 

15A18 

1 Introduction to DAE Solver 

Consider the linear DAEs 

( Ax'{t) + Bx{t) = f{t) , te{to,T] , 

( 1 ) 

[ x{to) = Z , 

where A, B are nx n matrices and A is singular. This kind of problems arises in 
a wide variety of applications in electrical engineering and control theory, see [4] . 

A matrix pencil is defined by XA + B with A G C. A pencil is said to be 
regular if det(AA -|- B) is not identically zero. When XA + B is regular, then the 

* Authors are supported by the research grant No. RG010/99-00S/JXQ/FST from the 
University of Macau. 
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equation (1) is solvable and there exists two nonsingular matrices P and Q such 



that 



FAQ 



I 0 
OiV ’ 



PBQ 



GO 

0 / 



Here the sum of the matrix sizes of N and G is n and N is a, nilpotent matrix, 
i.e., there exists a positive integer i/ such that N'^ = 0 and N'^~^ ^ 0, see [2]. 
To compute the matrix P and Q, we have the following constructive approach 
given in [7]: 

(i) Let Bi = cA+ B he nonsingular for some c G (D. Then 



B^\\A + B) = B:[^{Bi + {X- c)A) = I + {X- c)B:[^A 



for all A € (D. 

(ii) Let R be an invertible matrix such that R~^B^^AR is in Jordan form. 
Here, R can be found by using the “Jordan” command in Maple after doing with 
“linalg”. By interchanging the columns of R, we can assume that R~^B^^AR = 
diagjJi, Jo}- Here Ji and Jq are Jordan matrices where all the main diagonal 
entries of J\ are nonzero and all the main diagonal entries of Jq are zeros. 
Therefore, 



R-\l + {X- c)B^^A)R 



I+{X-c)Ji 0 

0 (/ — cJq) + XJq 



(iii) Then compute 



I 0 
0(/-cJo)-i 

/+(A-c)Ji 

0 



r/+(A-c)Ji 


0 


[ 0 


(/ — cJo) + AJo 



0 

I + A(/ — cJo) ^ Jo 



(iv) Since Jq is nilpotent and (J — cJq)”^ commutes with Jq, the matrix 
{I — cJq)~^Jq is also nilpotent. Let E be an invertible matrix such that E~^{I — 
cJq)~^ JqE = is in Jordan form. Then we have 



I 0 
0 E 



■jf^ + (A-c)J 0 




'GO' 


+ A 


'1 0 ■ 


0 I + XN 




0 / 


0 N 



Jfi 0 


I+{X-c)Ji 0 


0 E~^ 


0 / + A(J-cJo)-Vo 



where G = — cL 

(v) Let P be the product of all the matrices used to multiply the matrix 
pencil AH + B on the left in steps (i)-(iv) and let Q be the product of all the 
matrices used to multiply the matrix pencil on the right in steps (i)-(iv). We 



have 



P{XA + B)Q = X 



I 0 
0 N 



GO 
0 I ■ 



The P and Q are our desired matrices. 
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Remark 1. Using this method to construct the matrices P and Q is only efficient 
when the system size is small. 

Applying the coordinate changes P and Q to the DAEs in (1), we have 

y[ - 1 - Gy I = gi{t) , , . 

Ny'2 + y2 = g2{t) , ^ 

where Q~^x = [yJ^y'^V' Pf = [gJ^g'^Y' ■ The first equation in (2) is a 
system of ODEs and a solution exists for any initial value of y\. The second 
equation has only one solution 

v-l 

y2{t) = Y^{-\yN^gf{t) 

where g^y\t) denotes the i-th order derivative of g 2 {i) with respect to t. 

In the remainder of this paper, we concentrate on the first equation in (2) 
with a given initial condition. 



2 The Matrix Forms of BVMs 



Now, we consider the following general initial value problem (IVP), 
{y'{t) = Jmy{t)+g{t) , te(to,T] , 
i y(to) = 2 , 



(3) 



where y{t), g{t) : IR ^ M™, and J„ € 

BVMs are methods based on linear multistep formulae (LMF) for solving 
ODEs, see [3]. For given IVP in (3), a BVM approximates its solution by means 
of a discrete boundary value problem. By using a /x-step LMF over a uniform 
mesh tj = to + jh, j = 0, ■ ■ ■ , s, with h = (T — to) /s, we have 



fi—iy 

ai+^yn+i = ^ X] > n = v, - ■ ■ ,s - y + V . (4) 

i= — i> i——i/ 

Here, yn is the discrete approximation to y(t„), fn = Jmyn+gn and = g(tn). 

The BVM in (4) must be used with v initial conditions and y — v final 
conditions. The initial condition in (3) only provides us with one value, we have 
to provide additional {y — 1) equations: 

f^ , j = !,■■■, iy-1 , (5) 

z— 0 z— 0 

and 

'^a]^l,ys-i = hJ2Pl-,fs-i : j = s- y + i 2 +l,--- ,s . (6) 

z=0 z=0 



508 Siu-Long Lei and Xiao-Qing Jin 



By combining (4), (5) and (6), we obtain a linear system My = b where 

M = G ® Im- hH 0 Jm , (7) 

y = [yp , • • • , yj]'^ e and b = ei ^ z + h{H 0 J„)g with ei = 

and g = [g^ , ■ ■ ■ , The matrix G G 

j^(«+i)x(s+i) j]p jg defined by: 



■f 


... 0 


Op 


• • • 






Op 


‘ ‘ ‘ 




ao ■ * * 



G = 



ao 



CXn 



(S-/J+J/+ 1 ) 

^0 



v(®) 



a 



(s — /i + !^ + l) 



a 



is) 



and H G in (7) is defined similarly by using {/3p^} instead of 

in G for all i = 1, 2, • • • , /r and j = 1, 2, • • • , s, and the first row of H is zero. 
The advantage in using BVMs is that they have much better stability properties 
than traditional initial value methods, see [3]. 



3 Construction of Preconditioner 

The following preconditioner for (7) is proposed: 

S = s{G) ® Ijn — hs{H) 0 Jrr 
where s(G) G is defined by 

av ■ ■ ■ Ofj, OLQ ■ ■ 



~s{G) = 



ao 



Q^i/'+i ■ * ■ 






ao 



ao • • • aiz 



(8) 
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and s{H) G is defined similarly by using {j3i\ instead of {ai} in 

s{G). The sequences and {/3i}f^o are the coefficients in (4). We note 

that S is the Strang-type block-circulant preconditioner proposed in [5]. 

The invertibility of S depends on the stability of the BVM that we used to 
discretize (3). The stability of a BVM is closely related to two characteristic 
polynomials defined as follows: 



Definition 1 ( [3]). Consider a BVM with the charaeteristic polynomials p{z) 
and cr(z) defined by (9). The region 



is called the region of -stability of the given BVM. Moreover, the BVM is 

said to be A,j^^-,^-stable i/(D~ = {g G (D : Re{q) < 0} C 

Theorem 1 ( [5]). If the BVM for (3) is A^^^-u-stable and hXk G 
where Xk (k = are the eigenvalues of Jm, then the preeonditioner 

S = s(G) ® I’m — hs{H) 0 Jm is nonsingular. In particular, S is nonsingular if 

Afc (D . 

It is well known that if the spectrum of the preconditioned system is clustered, 
then the GMRES method applied for solving the preconditioned system will 
converge very fast. 

Theorem 2 ( [5]). All the eigenvalues of the preconditioned matrix S are 
1 except for at most 2mp outliers. The GMRES method, when applied for solving 
the preconditioned system S~^My = S~^b, will converge in at most 2mp, -|- 1 
iterations in exact arithmetic. 

Regarding the operation cost of the method, we refer to [5]. 

4 Numerical Example 

In this section, we compare the Strang-type block-circulant preconditioner with 
other preconditioners by solving the subsystem of ODEs extracted from a system 
of DAEs. All the experiments are preformed in MATLAB with machine precision 
IQ-ie^ The GMRES method [6] is employed to solve linear systems. We use the 
MATLAB-provided M-file “gmres” (see MATLAB on-line documentation) in 
our implementation. In our example, the zero vector is the initial guess and 
the stopping criterion is ||r 5 || 2 /||ro ||2 < 10“® where is the residual after q 
iterations. 




3 = 



and 




(9) 



= {g e G : p{z) — qa{z) has v zeros inside \z\ = 1 
and p — V zeros outside \z\ = 1} 
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Example 1. Consider 



Ax'{t) + Bx{t) = 0 , t € (0, 1] , 
o;(0) = [1,1, 1,1, 1,1,1]^ , 



where 



50 


114 


95 


140 


129 


91 


43 


101 


198 


149 


155 


223 


183 


138 


97 


206 


156 


197 


187 


156 


87 


82 


185 


148 


164 


156 


129 


81 


82 


202 


167 


186 


201 


180 


114 


111 


226 


193 


197 


229 


198 


138 


32 


122 


107 


100 


115 


100 


74 



79 


156 


158 


209 


188 


69 


47 


87 


256 


161 


162 


241 


203 


162 


168 


264 


203 


272 


223 


78 


52 


180 


260 


189 


229 


255 


142 


111 


135 


295 


243 


250 


282 


200 


158 


188 


357 


268 


298 


337 


261 


185 


53 


167 


141 


88 


196 


174 


166 



Then there are two invertible matrices 



'9 


0 


1 


3 


2 


4 


6' 




A 


3 


6 


7 


4 


1 


O' 


2 


8 


4 


8 


1 


8 


3 




1 


8 


8 


3 


6 


9 


8 


6 


4 


9 


0 


0 


5 


8 




6 


8 


6 


8 


6 


2 


1 


4 


6 


9 


1 


7 


2 


5 


Q-^ = 


3 


5 


3 


5 


7 


2 


2 


GO 


7 


4 


2 


4 


6 


7 




5 


4 


2 


3 


9 


8 


6 


7 


9 


8 


1 


9 


8 


4 




1 


8 


3 


7 


5 


7 


2 


4 


7 


0 


6 


4 


0 


3 




6 


8 


5 


5 


8 


1 


4 



such that 



where 



PAQ = 



h 0 
0 N 



PBQ = 



C 0 
0 h 



N = 



0 0 0 
1 0 0 
0 1 0 



and C = 



2 0 0 0 
-12 0 0 
0-120 
0 0-12 



Now, we show the efficiency of solving the following IVP, 



y'{t) = -Cy{t) , t e (0, 1] , 

y(0) = [22,43,37,27]^ . 



The third order generalized backward differentiation formula is used to solve 
this system of ODEs. The formulae and the additional initial and final equations 
can be found in [3]. 

Table 1 lists the number of iterations required for convergence of the GMRES 
method with different preconditioners. In the table, I means no preconditioner is 
used and S denotes the Strang-type block-circulant preconditioner defined as in 
(8). For a comparison, we introduce T. Chan’s preconditioner and Bertaccini’s 
preconditioner, see [1]. T. Chan’s block-circulant preconditioner T is defined as 



T = c{G) ® Im- hc{H) (g) Jm 
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where the diagonals dj of c(G) are given by 

j \ j 

— -q— j- ) ^ ^ ^ , J 0, * * * , s , (10) 

and the diagonals /3j of c{H) are defined similarly by replacing aj+i, by /3j+i, 
and aj+jy_(s+i) by in (10). And Bertaccini’s block-circulant precon- 

ditioner P is defined as 

P=G(g>Im-hH®Jm 
where the diagonals aj of G are given by 
j \ j 

— -q— Y 1 cTj-i-i/ -|- ^ ^ crj_|_j/_(s+i) ; j 0, • • • , s , (11) 

and the diagonals f3j of H are defined similarly by replacing aj+i, by j3j+i, and 
(s+i) by (s+i) (11)’ 

We see from Table 1 that when s increases, the numbers of iterations required 
for convergence stay almost the same when preconditioners are used. However, 
the number of iterations increases when no preconditioner is used. It is also 
clear that the Strang-type block-circulant preconditioner performs better than 
T. Chan’s and Bertaccini’s preconditioners. To further illustrate the clustering 
property in Theorem 2, we give in Figures 1,2 the spectra of the preconditioned 
matrices with the preconditioner S and the spectra with no preconditioner, for 
s = 12,48. 
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Solvability of Runge-Kutta and Block-BVMs 
Systems Applied to Scalar ODEs* 



G. Di Lena and F. lavernaro 

Dipartimento di Matematica, Universita di Bari, 
Via E. Orabona 4, 1-70125 Bari, Italy 



Abstract. A characterization of Pq matrices is reported and used to 
derive simple necessary and sufficient conditions for the unique solvability 
of a class of nonlinear systems of equations depending on a parameter. An 
application to the problem of existence and uniqueness of the solutions 
of one-step implicit schemes applied to scalar ODEs is also presented. 



1 Introduction 

A square, real matrix A = (cty) G is said to be a P matrix (Pq matrix) 

if all its principal minors are positive (nonnegative). A well known characteri- 
zation of P matrices is the following (see [8]): A is a P matrix if and only if 
for each real vector x G K", x 0, there exists an index i G {1, . . . ,n} such 
that Xi > 0- 

The analogous characterization of Pq matrices is reported in the following 
theorem. 

Theorem 1. The following statements are equivalent: 

(i) A is a Pq matrix; 

(a) for each real vector x G IFP , x 0, there exists an index i G {1, . . . , n} such 
that 



Xi 0 and Xi oiijXj > 0. 

Proof. It is sufficient to observe that A is a Pq matrix if and only \i A + el is 
a P matrix for any e > 0 (/ stands for the identity matrix of dimension n) . The 
assertion is then a direct consequence of the above mentioned characterization 
of P matrices. □ 

Stated differently, a Pq matrix A is a P matrix or a vector x may be found 
with Xi ^ Q and EJ=i ^ij^j = 0. In the next section we consider an application 
of this result to the problem of unique solvability of the nonlinear system of 
equations 

Ay - hPP(y) = b, (1) 

* Work supported by MURST and GNIM. 
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where A and B are real square matrices of dimension n with det{A) 7^ 0, y = 
[j/i, . . . , ?/n]^ is a vector of unknowns, b = [61, is a known term, h is 
a nonnegative parameter and F(y) = [fi(yi), f 2 (y 2 ), ■ ■ ■ , fniUn)]'^ , where fi : 
IR — !■ IR are continuous functions satisfying the monotonicity condition 



fijx) - My) 

x-y 



< y 



Va;,y e M, a; fy y. 



(2) 



We denote by the set of all functions F whose components fy satisfy (2). 
Systems of the form (1) arise, for example, when an implicit one-step method 
(such as a Runge-Kutta (RK) or a Boundary Value Method (BVM) ) is used to 
solve the scalar Initial Value Problem (IVP) 



\y'{t) = tG[to,tf] /ON 

\y{to) = yo ’ 

where / satisfies a one-sided Lipschitz condition with one-sided Lipschitz con- 
stant /i. In this case the parameter h stands for the stepsize of integration. The 
more general problem of determining equivalent conditions to the existence and 
uniqueness of solutions of the internal stages of a RK-method when / is a vector 
function, has been fully studied (see for example [2,3,4,6,7,9,10] and references 
therein). These conditions essentially lead to certain restrictions on the choice 
of the stepsize h which are independent of the dimension of the IVP. In Section 
3 we show that such restrictions become weaker when the scalar case is consid- 
ered. For a reference on Boundary Value Methods, their definition, properties 
and implementation techniques see for example [1,11,12]. 

For reasons that will be clear in the sequel it is more convenient to recast 
Theorem 1 into an equivalent form. We recall that the Hadamard (or Schur) 
product xoy of two vectors x, y S IR”, is a vector of length n whose components 
are (xoy)^ = Xiyt (see [8]). The standard notation x > 0 (x > 0, x < 0, x < 0) 
is used to consider vectors with positive (nonnegative, negative, nonpositive) 
components. Associated to a real matrix A of dimension n we consider the 
following set of vectors: 



x(M) = {x G IR"|x o Ax < 0 and ((Mx)^ = 0 Xi = 0)} . 
Observe that x(M) fy 0 since 0 G x(M). It follows that 



/ n 



i aijXj <0, Vi = 1, . . . , n. 



X G x(Vl) < 



i=i 



aijXj = 0 Xi = 0, 

i=i 

and Theorem 1 may be rewritten in a more compact notation as follows: 



M is a Po matrix x(vl) = {0}. (4) 

We finally observe that x(0) = {0} and for each /3 G IR and x G x(vl), 
/3x G x(Vl), that is x(Vl) is star shaped with respect to the origin. 
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2 Pq Matrices and the Solution of the System (1) 

Hereafter our interest is in the characterization of the unique solvability of the 
system (1) under the assumption (2) in terms of conditions related to the ma- 
trices A and B and the scalars h and fx. It is well known that this problem poses 
an upper bound on the product h^,. If 7 is a given real number, the system (1) 
is said to be uniquely solvable on the interval ] — 00,7 [ if it admits a unique 
solution whenever hfx s] — 00, 7[. Our goal is to find the optimal value of 7, say 
7, defined as 

7 = sup{7| the system (1) is uniquely solvable on the interval ] — 00, 7[}. (5) 

The results reported in the present section will be later compared to those 
known in literature and derived in a more general context. 

Since the existence of the solution of (1) on the open interval ] — 00,7 [ is a 
consequence of its uniqueness (see [5,7,10]), we confine our study to this latter 
question. 

Suppose bi, b2 G IR" are given and consider the systems — hBF{y^^^) = 

bi and Ay^‘^'> — hBF{y^‘^'>) = b2. Subtracting yields 



with Z\y = AF = F{y^'^'>) — F{y^^'^) and Ah = b 2 — bi. Uniqueness 

of the solution of (1) is then equivalent to requiring Z\b = 0 ^ Ay = 0. 

Observe that choosing F{y) = jiy (the linear case), a necessary condition for 
the uniqueness is founded to be det{A — h^B) ^ 0. Under this assumption the 
equation (6) with Ah = 0 is equivalent to 



The theorem below establishes the link between the present problem and Pq 
matrices. 

Theorem 2. Assume that {A — hfxB) is nonsingular and define A = h(A — 
hixB)~^ B . The following statements are equivalent: 

(a) the solution of ( 1 ) is unique for any F G 



Proof, (a) (b). Assume there exists x yf 0 such that x G x(A) and set 

Z\y = Ax. For i = 1 , . . . , n consider the linear functions fi{z) = aiz, where 



This definition is well posed since yf 0 Ayi yf 0. Furthermore, ob- 
serving that for each i = l,...,n , XiAyi < 0, it is easily seen that F{y) = 



AAy - hBAF = Ah, 



( 6 ) 



Ay = h{A — hyiB) ^ B{AF — fiAy). 



(7) 



(b) x{A) = {0}. 
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[/i(yi)i • ■ • I fniUn)]'^ € and the vector Ay ^ 0 satisfies ( 7 ) which compared 
to (a) gives a contradiction. 

(b) (a). Let Ay be solution of ( 7 ), with F G Defining x = AF — fiAy, 

equation ( 7 ) assumes the form 



Z\y = ^x. (8) 

From the expression of x, one can verify that AyiXi < 0 , and Ayi = 0 
Xi = 0 , that together with formula (8) imply that x S x(^)- From (6) we deduce 
X = 0 and consequently Ay = 0. □ 

Now we consider the family of matrices = {A — depending on 

the parameter 7 S 17 C K. The set 17 consists of all values of 7 such that 
the corresponding elements of the family are well defined, that is A — 7B is 
nonsingular. However it should be observed that if for a given 71 S K, 
exists and is a Pq matrix, then for each 7 < 71 , H — 7P is nonsingular and A^^ is 
still a Pq matrix. This property is a direct consequence of Theorem 2 and the fact 
that uniqueness of the solution of (1) in the class also implies uniqueness in 
the class for each ji < yiAt follows that the values of the parameter 7 which 
make A-y a Pq matrix (if any exists), form an interval of the form ] — oo, 5 ] if 
5 G fi or] — 00, ( 5 [ if 5 ^ 17 : this provides the basis for characterizing the number 
7 as defined in ( 5 ). In the sequel we adopt the convention sup 0 = — 00. 

Theorem 3 . The following expressions hold true for the scalar j: 

7 = sup{7 G 17 I A-yis a Pq matrix }; ( 9 ) 

if B is nonsingular, 

7 = min{A | A is a real eigenvalue of any principal submatrix of B~^A}. 

( 10 ) 

Proof. Formula ( 9 ) is a natural consequence of Theorem 2 and the previ- 
ous discussion (we observe that if det{A — 7P) 0 then 7 is a maximum). 

When B is nonsingular, after a simple manipulation one can check that Aj = 
(B~^A — jl) ^ . From ( 9 ) and considering that the inverse of a Pq matrix is a Pq 
matrix, we are conducted to seek the values of 7 that make {B~^A — 7/) a Pq 
matrix. Using the fact that P matrices are characterized by having positive all 
the real eigenvalues of their principal submatrices we arrive at formula (10). □ 

In the case when B is nonsingular, the number 7 may be explicitly determined 
by formula ( 10 ). Alternatively, working on the principal minors of the matrix 
A~f, one can locate 7 by means of an iterative procedure (such as the bisection 
method) that produces a sequence {7^} convergent to 7. We consider further 
applications in the next section. 
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3 Unique Solvability of RK and BVM Systems 

As mentioned in the introduction the study of conditions for the existence and 
uniqueness of systems of the form (1) has been conducted by a number of authors 
when the vectors y, F(y) and b have components in K™ with m a given positive 
integer. In such a general case formula ( 1 ) is replaced by < fi(x) — fi(y), x — y >< 
/i||x — y|p, x,yG H™, where < •, • > is an inner product on IR™ and the ma- 
trices A and B are viewed as linear operators (H™)" — *■ (IR™)", that is for 
example, Ay stands for (A® J)y with I the m-dimensional identity matrix. The 
application of a RK or a BVM formula to the solution of m-dimensional ODEs 
leads to such kind of systems. In particular for RK methods the solution of ( 1 ) 
represents the vector of internal stages so that the matrix A is the identity ma- 
trix while the coefficient of B are defined by the Bucher array. In the case of 
BVMs the system (I) itself represents the discrete counterpart of the continuous 
problem and the entries of A and B define the way this discretization is carried 
out. In all concrete cases the matrix A is nonsingular and therefore all the results 
obtained in the RK context are also available for BVMs (via a left multiplica- 
tion of the system by A“^). Kraaijevanger and Schneid [ 10 ] proved that when 
G = A~^B is irreducible and nonsingular, the unique solvability on the interval 
] — oo,7[ corresponds to requiring that the matrix G~^ — 7/ is Lyapunov diag- 
onally semi-stable, namely that a positive diagonal matrix D exists such that 
the matrix (G“^ — 'yI)'^D + D{G~^ — 7/) is positive semidefinite. More precisely 
they introduced the definition of suitability of the system (1) on intervals of the 
form ] — 00, 7 [ which means that it is uniquely solvable on that interval what- 
ever the choice of the dimension m. How one should expect (see for example 
lemma 2.3 in [ 10 ], page 135 ), the conditions for the unique solvability for a given 
dimension m also imply unique solvability for lower dimensions, while the con- 
verse is true only for dimensions m > n. Their result is in fact independent of 
the dimension m and so is the value of 7. To avoid confusion about the number 
7 in the scalar and vector case, we introduce the notation % when referring 
to this latter. The above mentioned authors showed that % = sup^^Q 7£>(G), 
with 7 _d(G) = sup {7l(G“^ — 7/)^!? -1- D{G~^ — 7/) is positive semi-definite }. 
In our discussion m has been fixed to one (scalar problems) and, since semi- 
stable matrices form a proper subset of the wider class of Pq matrices, we realize 
that weaker restrictions on the product hfi occur in this case, namely methods 
could in principle occur for which % < 7. While it is known that 7£i(G) is the 
smallest eigenvalue of + (D^/^G“^D“^/^)^)/2, the computa- 

tion of sup^j^Q 7 u(G) is difficult because the optimal D is not known a priori. 
However for a wide class of Runge-Kutta schemes the number % has been suc- 
cessfully determined showing that the upper bound % < mini is indeed 

attained if DG~^ + (DG~^)"^ is a diagonal matrix. This is the case of Gauss, 
Radau lA, Radau HA, Lobatto HIC methods (see for example [ 3 , 7 , 4 ]). Now, 
since 7 < mini(G“^)ii as well, it also follows that for these methods 7 = %. 
The same holds true for DIRK-methods for which the relation % = mini(l/Gii) 
has been proved. No concrete weakness emerges when passing from the vector 
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to the scalar problem for these methods; the following example shows that this 
does not represent the general rule. 

Example 1. Here we consider the Runge-Kutta formula used by Hairer and Wan- 
ner [7] to state that a B-stable method on a contractive problem (/i = 0) does 
not always admit a solution. The matrix G is defined as 




/ 3 0 3 -6\ 
6 9 0 1 

5 18 9 0 
\12 15 18 3/ 



while the abscissae and the weights are respectively c = (0, 1/3, 2/3, 1) and 
b = (1/8, 3/8, 3/8, 1/8). It has been shown that a problem of dimension two 
exists in the class Tq such that the system (1) does not have a solution: this also 
shows that % < 0. It is easy to verify that G is a P-matrix and hence from (9) 
it follows that 7 > 0. A direct computation based on formula (10) gives 7 = 3/4. 



Now we turn our attention to a class of block-Boundary Value Methods, 
namely the Generalized Adams Methods (GAMs) which have been inserted into 
a code (the code GAM [13]) that implements a variable-step variable-order strat- 
egy to determine numerical solutions of Initial Value Problems. Firstly we give 
a brief account on how such methods are defined (for simplicity they will be 
considered as applied to the scalar problem (3)). 

Starting from a known estimation of the true solution yit) at a given time 
(without loss of generality they may be assumed to be yo and to respectively), an 
order p block-BVM computes, through system (1), a vector y of approximations 
of order y -|- 1 to y{t) on the uniform mesh tj = to + jA, j = 1, . . . ,n, that 
is Vi = y(tj) + 0(hP+i), j = 1, . . . ,n. In particular, denoting by 
and {/3io}i=i,...,n the entries of B and b respectively, for a GAM of odd or- 
der p and dimension n, the i-th component of the system (1) is a liner multistep 
formula of the form 



70 



Ui—i — h ^ ] Pijfi^j, i — 1, . . . ,n. 



( 11 ) 



(0 



j=-k\ 

with and nonnegative integers such that = p — 1 and 



(i for t= l,...,(y-3)/2, 

M = < (P - l)/2 for i={p- l)/2, ...,n-{p- l)/2 , 
[ t — n -I- y — 1 for t = n — (y — 3) /2, . . . , n; 



From the above definition it is deduced that A is bidiagonal and Toeplitz 
with 1 and —1 as diagonal and lower diagonal entries. We also observe that for 
GAMs (and in general for block-BVMs) the only link between the order y and 
the dimension n of (1) is that the latter must be sufficiently large in order that 
the matrices A and B may contain the coefficient of each formula (11). The 
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Table 1. Estimated values of 7 for GAMs 



order 


iyic) ^(0) 


3 


0.9011 0.2943 0.9502 


5 


0.2646 0.2374 0.6674 


7 


-0.2228 0.1032 0.4591 


9 


-0.5532 0.1728 0.3577 



Table 2. Optimal symmetric distributions of the stepsizes 



order 


diagonal entries of H 


3 


1.0449 0.9551 0.9551 1.0449 


5 


0.8247 1.1138 1.0615 1.0615 1.1138 0.8247 


7 


0.6622 1.0070 1.1335 1.1972 1.1972 1.1335 1.0070 0.6622 


9 


0.6660 0.8630 1.1092 1.2336 1.2563 1.2336 1.1092 0.8630 0.6660 



code GAM is based on the Generalized Adams Methods of orders 3, 5, 7 and 
9 and dimensions 4,6,8 and 9 respectively. For an explicit list of the coefficients 
Pij see [11]; more about definitions, properties and implementation techniques 
of GAMs may be found in [1,11,12]. 

In the first column of Table 1 we report the values of 7 for the GAMs used in 
the code. Unfortunately, negative values of 7 occur for the GAMs of order 7 and 
9. A negative value of 7 means that given a stepsize h > 0 and a constant /r > 0, 
a scalar function f{t,y) can be found in the class for which the system (1) 
does not admit a unique solution. This circumstance could produce irregularity 
during the execution of the code, due to the unpleasant situation in which the 
predicted stepsize may be rejected because the scheme that provides the solution 
of (1) does not attain convergence. 

A way to overcome this problem is to allow different stepsizes inside each 
formula (11). A variable-step block-GAM takes the form 

Ay - HBF{y) = b, (12) 

where H = diag{hi, /12, ■ ■ ■ , hn). Gonsidering that a single step of a block-BVM 
covers a time interval of length nh, the system (12) is recast in the form (1) by 
setting h = l/nX]r=i hi, H = l/hH, and B = HB. The diagonal elements hi in 
the matrix H define the mesh 0 = 0, 1 =i-i +hi, i = 1, . . . ,n over the interval 
[0, n] and the question is how to choose the values of the abscissae 1 in order that 
the corresponding system (12) admits a positive value of 7. A first attempt is to 
consider smaller stepsizes at the beginning and at the end of the time interval, a 
technique that has been successfully used to improve convergence and stability 
properties of block-BVMs. A Ghebyshev distribution of the abscissae i satisfies 
this requirement and the corresponding values of 7 are reported in the column of 
Table 1 labeled by . We see that positiveness of 7 is achieved for the orders 7 
and 9 although is worse than 7^“^ (uniform mesh) for the orders 3 and 5. The 
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values 7 ^°^ solve the problem 7 *^°) = max| 7 |^i = hn-i, i = 1 , . . . ,n/ 2 |, and 

have been determined using the Matlab optimization toolbox. The constraint of 
a symmetric distribution avoids an undesirable growth of the condition number 
of the matrix A~^B, a prerequisite that guarantees well-conditioning of some 
problems that emerge when handling block-BVMs (see for example [1,12]). The 
values hi that correspond to the optimal 7 are reported in Table 2. 
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Abstract. This paper presents a new local perturbation bound for the 
continuous-time Lyapunov matrix equations, which is not formulated in 
terms of condition numbers. The new bound is a nonlinear, first order 
homogeneous function of the absolute perturbations in the data and is 
sharper than the linear local bounds based on condition numbers. 



1 Introduction 

The Lyapunov matrix equations (LME) are fundamental in the theory of linear 
systems. That is why, the problem of their reliable solution, including derivation 
of perturbation bounds, is of great practical interest. The conditioning of LME 
is well studied and different types of condition numbers are derived [1] - [6]. Un- 
fortunately, perturbation bounds, based on condition numbers, may eventually 
produce pessimistic results. 

In this paper a new local perturbation bound for the continuous-time LME 
is presented. It is a non-linear, first order homogeneous and tighter than the 
local bounds based on condition numbers. A comparative study of the new and 
existing local perturbation bounds is performed. 

The following notations are used later on: 7^™^" - the space of real m x n 
matrices; - the unit n x n matrix; = \aji] - the transpose of the matrix 
A = [oij]; vec(A) € 7^”^" - the column- wise vector representation of the matrix 
A G 7^™^"; 7I„2 G TZ"' - the vec-permutation matrix, such that vec(A^) = 
ilvec(A) for all X G A® B = [aijB] ~ the Kronecker product of the 

matrices A and B; || -112 ~ the spectral (or 2-) norm in 7^™^"; ||.||f ~ the Frobenius 
(or F-) norm in 7?,™^". The notation stands for ‘equal by definition’. 

2 Problem Statement 

Consider the LME 

F(X,P) := A^X + XA+Q = 0 (1) 

where X G 7?,"’^" is the unknown matrix, A G and Q = G 7?,"^" are 

given matrices and P := (A, Q). 
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We suppose that 0 ^ + Afc(A) : i G l,n, k G l,n}, where Ai(A) are 

the eigenvalues of the matrix A. Under this assumption the partial Frechet 
derivative Fx of F in X is invertible and (1) has a unique solution X = . 

Let the matrices A and Q be perturbed as 

Ai-^ A + AA, Q + AQ 



and denote by P + AP the perturbed matrix pair P in which A and Q are 
replaced hy A + A A and Q + AQ. Then the perturbed equation is 



F(V,P+AP) = 0. 



( 2 ) 



Since the operator Fx is invertible, the perturbed equation (2) has an unique 
solution Y = X + AX, Y = Y^ , in the neighborhood of X if the perturbation 
AP is sufficiently small. 

Denote by 

' ■ ' 



Zi:= [Aa,Aq]^ GUl 



the vector of absolute norm perturbations Aa '■= ||Z\A||f and Aq := ||Z\Q||f in 
the data matrices A and Q. 

In this paper we consider local bounds for the perturbation Ax '■= |jZ\X||F 
in the solution of (1). These are bounds of the type 



Ax<f{A) + 0{\\Af), A^O 



( 3 ) 



where / is a continuous function, non-decreasing in each of its arguments and 
satisfying /(O) = 0. Particular cases of (3) are the well known linear perturbation 
bounds [2] - [6] 

Ax < KaAa + KqAq + OdlAW"^) (4) 

and 

Ax <V2KcA^,,, + 0{\\Af) (5) 

where Ka and Kq are the individual absolute condition numbers of (1) relative 
to the perturbations in A and Q, Kc is the overall absolute condition number 
of (1) and A 

max — max{Z\^,Z\Q} . 

In what follows the local linear bounds (4) and (5) are first derived using 
the approach developed in [2,3]. Then a new perturbation bound of the type (3) 
is given, where / is not a linear but a first order homogeneous function of the 
vector of absolute perturbations A. 



3 Condition Numbers 

Consider the conditioning of the LME (1). Since F{X,P) = 0, the perturbed 
equation (2) may be written as 



F{X + AX,P+ AP) := 

Fx{AX) + Fa{AA)+Fq{AQ) + G{AX,AP)=0 
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where 

Fx{Z) = Z + ZA, Fa{Z) = ZX + XZ, Fq{Z) = Z 

are the partial Frechet derivatives of F in the corresponding matrix arguments, 
computed at the point (X, P), and G{AX, AP) contains second and higher order 
terms in AX , AP. 

Since the operator Fx(-) is invertible we get 

= <P{AX, AP) := -F^^ o Fa{AA) - F^'^ o Fq{AQ) - F^^{G{AX, AP)). 

( 6 ) 

The relation (6) gives 

Ax < KaAa + KqAq + (^{WAW"^) (7) 

where the quantities 

Ka=\\F-^oFa\\^. Kq=\\F-^oFq\\^ 

are the absolute condition numbers of (1) relative to the perturbations in A 
and (5, respectively. Here |jlF|| is the norm of the operator IF, induced by the 
F-norm. 

The calculation of the condition numbers Ka and Kq is straightforward. 
Denote by Mx, Ma and Mq 

the matrix representations of the operators Fx{-), Fa{-) and Fq{-) : 

Mx = A^ ® In + In® A^ , Ma = {In'^ + IIn 2 ) ® X) , Mq = In^ . 
Then 

Ka=\\M^^Ma\\^, Kq = \\M^^\\^. (8) 

Relation (6) also gives 

Ax <V2Kc An,,^ + 0{\\Af), A^O (9) 

where 

Kc = \\M-^[MA,In-]h (10) 

is the overall absolute condition number of LME (1). 

4 First Order Homogeneous Perturbation Bound 

The local linear bounds (4) and (5) may eventually produce pessimistic results. 
At the same time it is possible to derive a local, 

first order homogeneous bound which is sharper in general. 

The operator equation (6) may be written in a vector form as 

vec(Z\A) = iVivec(Z\A) + N 2 vec{AQ) — M^^vec{G{AX, AP)) (11) 

where 

Ni := -M^^Ma, N2 := -M^^. 
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The local linear bound (7), (8) is a corollary of (11): 

= ||Z\X||f= ||vec(Z\X)||2 
< esti(Z\,iV) + 0(||Z\f) 

:= ||iVi||2/\^+||iV2||2/iQ + 0(||/if) 

= KaAa + KqAq + 0(Pf ), Z\ ^ 0 

where N := [A^i,iV2]. 

Relation (11) also gives 

Z\x < est2(Z\,iV) + 0(Pf ) := ||iV||2||Z\||2 + 0(|!Z\f ), Z\ ^ 0. 

Since \\A \\2 < -\/2Z\max, the bound est 2 {A,N) is less than or equal to the local 
linear bound (9), (10). 

The bounds esti(Z\, A^) and est 2 {A,N) are alternative, i.e. which one is less 
depends on the particular value of A. 

There is also a third bound, which is always less than or equal to esti(Z\, N). 
We have 

Z\x < est3(Z\,iV) := ^ A^ S{N)A + 0(|iZ\f ), Z\ ^ 0 
where S{N) is the 2x2 matrix with elements Sij{N) = || ||2. Since 

||iV7iV,||2< ||iV,||2||iV,||2 

we get est3(Z\,iV) < esti(Z\,iV). Hence we have the overall estimate 

< est(A N) + 0(||Z\f ), Z\ ^ 0 (12) 

where 

est(Z\,iV) := min{est2(2\, A^),est3(Z\, A^)}. (13) 

The local bound est(Z\, A^) in (12), (13) is a non-linear, first order homogeneous 
and piece-wise real analytic function in Z\. 



5 Numerical Examples 



Among many numerical experiments, the following one is reported. According 
to the Matlab syntax, some block matrices are defined by : M=invhilb(n); 
Z=zeros(n, n); J=ones(n, n). Now the A Lyapunov equation parameter is set 
to : 






M Z 
J M ’ 



Then the matrix Q is computed from A and X with : 



X = scale * 



1 1 ... 1 

1 1 ... 1 



1111 




On the Local Sensitivity of the Lyapunov Equations 



525 



where scale is a integer scaling factor. Lastly, the perturbations AA and AQ are 
generated from A and Q, by randomly perturbating their least significant bits. 
Setting here n = 4, a moderately ill conditionned 8x8 Lyapunov equation is 
obtained. Numerical results are summarised bellow : 



scale 


elin 


est2 


estl 


1 


4.1091e- 010 


2.9161e- 010 


2.9142e-010 


10 


4.2985e- 009 


4.0133e- 009 


3.0711e-009 


100 


2.6129e- 007 


1.8672e- 007 


2.7182e-008 


led 


2.9207e- 003 


2.0652e- 003 


1.6813e-006 


le6 


3.2627e-h001 


2.3070e-h001 


1.8975e-004 


lelO 


3.3388e-h009 


2.3609e-h009 


1.9326e-h000 



scale 


est3 




1 


2.9142e-010 


6.6102e- on 


10 


3.0711e-009 


3.7805e- 010 


100 


2.7182e-008 


7.9174e- 009 


led 


1.6813e-006 


6.8222e- 007 


le6 


1.8975e-004 


7.5801e- 005 


lelO 


1.9326e-h000 


9.4309e- 001 



Clearly, the new local bounds can give much sharper results than the stan- 
dard linear perturbation bound ”elin”(5). For comparison purpose. Ax has been 
computed, from its definition, through the numerical solution of the original Ly- 
punov equation and its perturbed version. If relative errors are invoked, the new 
bounds remain, for this numerical test, 300% to 900% better than the linear one. 

6 Conclusion 

New local perturbations bounds has been presented. These non linear bounds 
can be much sharper than their linear counterpart, depending on the prob- 
lem data. This conclusion remains true if relative errors are con- 

sidered. In such a developpement Ka, Kq, Kl must be respectively replaced 
by Ka/ ||X||^, Kq/ ||X||^ , A%/ ||X||^ , and M^, Mg, by ||%||^ M^, ||Q||^ Mg. 
Extensions of these results to non local analysis is under investigation. 
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Abstract. In this paper, the level set method is coupled with the bound- 
ary element method to simulate dynamic powder consolidation of metals 
based on linear elastostatics theory. We focus on the case of two particles 
that are in contact. The boundaries of the two particles are expressed 
as the zero level curves of two level set functions. The boundary integral 
equations are discretized using the piecewise linear elements at some pro- 
jections of irregular grid points on the boundaries of the two particles. 
Numerical examples are also provided. 

1 Introduction 

The application of large amplitude stress waves for materials processing and 
powder compaction has been of increasing interest in recent years [3,7]. The 
technique is also used for materials synthesis where the stress wave can pro- 
mote metallurgical reactions between two or more pure powders to produce 
alloy phases. When powder consolidation is of interest, it is important to under- 
stand the interaction, deformation, and bonding of particles in response to the 
stress wave. But the understanding of the dynamic process is far more complete. 
There are few papers on numerical simulations in the literature. A finite element 
method [1] gives some microscope analysis of a few particles. However, it seems 
that one can not afford to generate a body fitting grid for thousands particles 
at every time steps. 

In this paper, we develop a boundary element-level set method to simulate 
the solidification process of metal particles. The choice of the boundary element 
method is based on the fact that we are only interested in the motion (deforma- 
tion) of the boundaries of the particles, and the boundary integral equations are 
available and well understood. The use of the level set method [6] is to eliminate 
the cost of the grid generation and to simplify simulations for three dimensional 
problems. The projections of irregular grid points serve as a bridge between the 
boundaries of the particles and the underline Cartesian grid. 
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2 The Boundary Integral Equations 



We consider a simplified model that involves only two interacting particles, see 
Fig. 1 for an illustration. We assume that a small but fixed traction/pressure is 
applied to a portion of the boundary of a left particle while a portion of a second 
particle is fixed against a wall. We expect the particle will deform and want to 
know the position of the particles and the traction along the boundaries. For a 
problem with many particles, we can decompose the particle as groups of two 
particles using domain decomposition techniques. 

We assume that the deformation is small, then from the linear elastostatics 
theory, see for example, [2], the traction p and the deformation u in the vector 
form 



Ml 




Vi 




, P = 




_U2_ 




.P2. 



are coupled by the following boundary integral equations: 



cu(C) 



J P*(C,x) u(x)dF(x) 



[ u*(^,x)p(x) dr(x) 

f 

+ / u*(5,x)b(x) df?(x), 
Jo 



( 2 ) 



where b is the body force, c]k = ¥ik, where 5ik is the Kronecker delta, p* and 
u* are the Green’s functions 



Pll Pl 2 




U*ll UI2 




, u* = 




P21P22. 




U21 U 22 _ 



with 

1 / dv 

" 4^(1 -t.)r t F fe ] + (1 - 2i^) {ni r, k 

where 




The Lame’s constant can be expressed in terms of the more familiar shear mod- 
ulus G, modulus of elasticity E and Poisson’s ratio v by the following formulae. 



/i = G = 



E 



2{l + vY 



\ = 



vE 

(1 -I- v){\ — 2 u) 



( 4 ) 



In our simulation, the body force is zero. E = T^, fc = 1 or fc = 2, is one of 
the boundaries of the two particles. We want to evaluate the displacement of u 
along the boundaries so that we can determine the location of the boundaries. 



A Level Set-Boundary Element Method 529 



3 Numerical Method 

We use a rectangular domain f2 = [a, 6] x [c, d\ to enclose the two particles and 
generate a Cartesian grid 



Xi = a + ihx, i = (5) 

y. = c + jhy, j = 0,l,---n. (6) 

We use the zero level sets of two functions (pi{x,y) and (p 2 {x,y) to express the 
two particles respectively 

{ < 0 inside the fc-th particle, 

= 0 on the boundary of the fc-th particle, (7) 

> 0 outside the fc-th particle, 

where fc = 1 or fc = 2. Since the two articles are immiscible, they share part of 
common boundary. 

To use the boundary element method, we need to have some discrete points 
on the boundaries. We choose some of the projections of irregular grid points on 
the boundaries. An irregular grid point ixi,yj) is a grid point at which one of 
the level set functions (pk{xi,yj) changes signs in the standard five point stencil 
centered at (xi,yj). 

The projection of an irregular grid point x = (xi,yj) on the boundary is 
determined by 



X* = X -I- a q, where q = 






and a is determined from the following quadratic equation: 

ipk (x) -f ||V(/?fc ||2 « + ^ (q^He{(pk) q) a'^ = 0, 



(8) 

(9) 



where He{(fk) is the Hessian matrix of (fk evaluated at x. All the first and second 
order derivatives at the irregular grid point are evaluated using the second order 
central finite difference schemes. 



3.1 Contact of Two Particles 

We use two level set functions to represent two immiscible particles and update 
their positions. On the part of the contact, both the level set functions should 
be zero. 

Given two level set functions whose zero level curves intersect with each 
other, that is, pi^ij < 0 and < 0, we modify the level set functions in the 
following way 



^k,ij 



‘Pk,ij + 5 



( 10 ) 
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where 



2 



( 11 ) 



After such adjustment, the two level set functions can only have contact but 
not overlap, see Fig. 1 for an illustration. Note that it may be necessary to 
re-initialize the level set functions after such an adjustment. 



(a) (b) 





Fig. 1. Contact adjustment of the boundaries of two particles, (a): Two zero level 
set functions that overlap with each other, (b): The zero level curves (boundaries) 
of two particles after the adjustment 



3.2 Set-up the Linear System of Equations 



Since the projections on the zero level sets are not equally spaced, we use the 
piecewise linear basis functions to approximate u and p, and discretize the 
boundary integral equation to get second order accurate method. For example, 
the displacement u between two nodal points can be interpolated using 



u(^) = + 4>2V? 



[4>i^ 4>2] 




( 12 ) 



In the expression above, ^ is the dimensionless coordinate varying from —1 to 
+ 1 and the two interpolation functions are 

<('i = ^(i-0; h = \{^ + 0- (13) 

Given a projection (x*,y*) on one of the boundaries ipk = 0, the next point 
is determined as the closest projection in a small neighborhood of 
{x * , y* ) that satisfies 



V(^fe(xr,2/r)-v^fe(a:r+i,yr+i)>o, 
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where fe = 1 or fc = 2. The gradient at the projection is computed using the 
bi-linear interpolation from those at the neighboring grid points evaluated with 
the standard central finite difference scheme. We refer the reader to [4,5] for 
detailed information on the bi-linear interpolation between projections and grid 
points. 

The matrix-vector form of the linear system of equations at a particular 
node i can be written as 

N N 

c*W -f = Y , (14) 

i=i 

where N is the total number of nodes or the projections that are used to set up 
the equations, and (both are 2x2 matrices), are influence matrices. 
To avoid an ill-conditioned system and reduce the size of the linear system 
of equations, we use ONLY the projections of irregular grid points from one 
particular side. For example, we use the projections of irregular grid points where 
ipi{xi,yj) < 0 and those projections of irregular grid points where ip 2 {xi,yj) > 0. 
In this way, along the contact of the two particles, we can use just one projection 
at irregular grid points. 

We use Gaussian quadrature of order four, which is an open formula, to 
evaluate the integrals and for i yf j. If f = j, the integral is a singular 
Cauchy principal integral and we use the rigid body condition 

H“ = - Y (15) 

J = 1 

to evaluate the diagonal entries. The details about the boundary element method 
can be found in Chapter 3 of Brebbia and Dominguez’s book [2]. 



3.3 Velocity Evaluation 

In order to use the level set function, we need to evaluate the velocity at the 
grid points in a computational tube. At those grid points where the projections 
are used to set up the linear system of equations, we directly shift the velocity 
to the grid points. At those grid points where the projections are not used, for 
example, ip\{xi,yj) > 0, we use the velocity at the closest projection from the 
other side where tp\{xi,yj) < 0. 

After we have evaluated the velocity Wfc, 2 ]^ at irregular grid 

points, we need to extend the normal velocity 



Vfc = Uk,in^ + Uk, 2 ny, k = l or fc = 2, 



(16) 



where (nx,ny) = V(/3fc/||V</3fc||2 is the unit normal direction, to all grid points 
inside a computational tube \ipk\ < <5 surrounding the boundary of the particle, 
where d = Ch is the width of the computational domain. This is done through 
an upwind scheme 



dt 



± Wfe- 






= 0 , 



(17) 
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k = 1 or fc = 2, which propagates Vk along the normal direction away from 
the interface. The sign is determined from the normal direction of the level set 
function. 



3.4 Update the Level Set Functions 

Once we have obtained the normal velocity in the computational domain, we 
can update the level set functions by solving one step of the Hamilton Jacobi 
equation 



% + Ufc||V(/?fc ||2 = 0, k = l or k = 2. (18) 

at 

The zero level sets (fk = 0 then gives the new location of the boundaries of the 
two particles. 

We summarize our algorithm below: 

~ Set up the problem that includes input of the material parameters, initial- 
ization of the two level functions that represent the two particles. 

— Adjust the level set functions at grid points where the two level set functions 
are both non-negative to treat the contact part. 

— Find the projections of irregular grid points inside the first particle and 
outside the second particle. 

— Find the next point for each projection on the boundaries to form the line 
segment needed in the boundary element method. 

— Set-up the system of equations using the Gaussian quadrature of order four 
at all selected projections for each level set function. If p is known then u is 
unknown and vise versa. At contact, both p and u are unknowns. Use the 
rigid body condition to compute the diagonal entries. 

— Shift the velocity to irregular grid points. 

— Extend the normal velocity to a computational tube with a pre-selected 
width S. 

— Update the two level set functions by solving the Hamilton- Jacobi equation. 

— Repeat the process if necessary. 

4 Numerical Examples 

We have done a number of numerical experiments. The results are reasonable 
good and are within the regime of the linear elasticity. 

Example 1. The material parameters for the first and second particles are 

Gi = fXi = 26, I'l = 0.33, G 2 — /T2 = 83, r'2 = 0.27. 

The boundaries of the initial two particles are the circles 
(x - 0.31)2 -h 7/2 ^ q_322. 



(x - 0.21)2 -H j/2 ^ Q 222 




A Level Set-Boundary Element Method 533 



before the adjustment. From the left, we apply a constant p 
p{x,y) = C cos{-Kx/2e), for |x| < e, 

on the part of the boundary of the left particle, where we take C = 5 and e = 0.1. 
On the right, we fix the displacement u = 0 if |x — Xmax\ < 0.05 along the part 
of the boundary of the right particle, where Xmax is the largest x coordinates of 
the projections of irregular grid points. Fig. 2 is the computational result using 
our method. 



(120,60), (26,0.33), (83,0.27), (0.31,0.32), (0.21,0.22) 




source str=5, cos dist. dl = h/(10 u^^) 











• 






• 



Fig. 2. Numerical result of Example 1 using a 120 by 60 grid. The upper half 
picture is the original particles; the lower half is the computed result 



Example 2. The material parameters for the first and second particles are 
Gi = Pi = 83, I'l = 0.33, Gt2 = p2 — 26, 1^2 — 0.27. 

The boundaries of the initial two particles are the circles 

(x -k 0.21)2 + y2 ^ q_222. ( 2 ; _ 0.31)2 + y2 ^ Q 322^ 

before the adjustment. The rest of set-up is the same as Example 1. Fig. 3 is the 
computational result using our method. 

5 Conclusion and Acknowledgment 

A new numerical method that couples the boundary element method with the 
level set method is proposed in this paper to simulate multi-particles of liner 
elasticity. The new method can handle the contact of two particles easily. 
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m=160, n=80, G^=83, |i^=0.33, 0^=26, 11^=0.27 





Fig. 3. Numerical result of Example 2 using a 160 by 80 grid. The upper half 
picture is the original particles; the lower half is the computed result 
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Abstract. It was recently shown that block-circulant preconditioners 
applied to a conjugate gradient method used to solve structured sparse 
linear systems arising from 2D or 3D elliptic problems have good numer- 
ical properties and a potential for high parallel efficiency. In this note 
parallel performance of a circulant block-factorization based precondi- 
tioner applied to a 3D model problem is investigated. The aim of the 
presentation is to report on the experimental data obtained on SUN En- 
terprise 3000, SGI/Cray Origin 2000, Gray J-9x, Gray T3E computers 
and on two PG clusters. 



1 Introduction 

Let us consider numerical solution of a self-adjoint second order 3D linear bound- 
ary value problem of elliptic type. After discretization, such a problem results 
in a linear system Ax = b, where A is a sparse symmetric positive definite ma- 
trix. In the computational practice, large-scale problems of this class are most 
often solved by Krylov subspace iterative (e.g. conjugate gradient) methods. 
Each step of such a method requires only a single matrix-vector product and 
allows exploitation of sparsity of A. The rate of convergence of these methods 
depends on the condition number k of the matrix A (smaller k(A) results in 
faster convergence). Unfortunately, for second order 3D elliptic problems, usu- 
ally k(A) = where N is the size of the discrete problem, and hence it 

grows rapidly with N. To alleviate this problem, iterative methods are almost 
always used with a preconditioner M . The preconditioner is chosen with two 
criteria in mind: to minimize k(M~^A) and to allow efficient computation of the 
product M“^v for any given vector v. These two goals are often in conflict and 
a lot of research has been done devising preconditioners that strike a balance be- 
tween them. Recently, a third aspect has been added to the above two, namely, 
the parallel efficiency of the iterative method (and thus the preconditioner). 

One of the most popular and the most successful preconditioners are the in- 
complete LU (ILU) factorizations. Unfortunately, standard ILU preconditioners 
have limited degree of parallelism. Some attempts to modify them and introduce 
more parallelism often result in a deterioration of the convergence rate. R. Chan 
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and T. F. Chan [2] proposed another class of preconditioners based on averaging 
coefficients of A to form a block-circulant approximation. The block-circulant 
preconditioners are highly parallelizable but they are very sensitive to a possible 
high variation of the coefficients of the elliptic operator. To reduce this sensitivity 
a new class of circulant block- factorization (CBF) preconditioners [5] was intro- 
duced by Lirkov, Margenov and Vassilevski. Recently a new CBF preconditioner 
for 3D problems was introduced in [3,4]. 

The main goal of this note is to report on the parallel performance of the 
PCG method with a circulant block- factorization preconditioner applied to a 
model 3D linear PDF of elliptic type. Results of experiments performed on Sun 
Ultra-Enterprise, Crays J-9x and T3E, SGI/Cray Origin 2000 high performance 
computers and on two PC clusters are presented and analyzed. 

We proceed as follows. In Section 2 we sketch the algorithm of the parallel 
preconditioner (for more details see [3,4]). Section 3 contains the theoretical 
estimate of its arithmetical complexity. Finally, in Section 4 we report the results 
of our experiments. 



2 Circulant Block-Factorization 



Let us recall that a circulant matrix C has the form (Ckj) = {c(j-t) mod m) ^ 
where m is the dimension of C. Let us also denote by C = (cq,ci, . . . , Cm - i ) 
the circulant matrix with the first row (cq,ci, . . . Any circulant matrix 

can be factorized as C = FAF* where R is a diagonal matrix containing the 
eigenvalues of C, and F is the Fourier matrix of the form 



Fjk — 






( 1 ) 



— T 

where F* = F denotes the adjoint matrix of F. 

The CBF preconditioning technique incorporates the circulant approxima- 
tions into the framework of LU block-factorization. Let us consider a 3D elliptic 
problem (see also [3]) on the unit cube with Dirichlet boundary conditions. If 
the domain is discretized on a uniform grid with ni , U 2 and grid points along 
the coordinate directions, and if a standard (for such a problem) seven-point 
FDM (FEM) approximation is used, then the stiffness matrix A admits a block- 
tridiagonal structure. The matrix A can be written in the form 



A = tridiag{-Ai^i_i,Ai^i,-Ai^i+i) i = 1, 2, . . . , m, 

where Ai^i are block-tridiagonal matrices which correspond to the a:i-plane and 
the off-diagonal blocks are diagonal matrices. In this case the general CBF pre- 
conditioning approach is applied to construct the preconditioner Mcbf in the 
form 

FIcbf — tvidicigi^ , Cjy-i-i ) i — 1,2, ...ux, (2) 

where Ci^ = Block-Circulant(Ai j) is a block-circulant approximation of the 
corresponding block . The stiffness matrix A and the preconditioner Mcbf 
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are N x N matrices where N = 711712^3. The relative condition number of the 
CBF preconditioner for the model (Laplace) 3 D problem for n\ = U2 = nj, = n 
is (for derivation see [ 3 ]): 

< 4 n. ( 3 ) 



2.1 Parallel Circulant Block- Factorization Preconditioner 

The basic advantage of circulant preconditioners is their inherent parallelism. 
Let us now describe how to implement in parallel an application of the inverse 
of the preconditioner to a given vector. Using the standard LU factorization 
procedure, we can first split M = D — L — U into its block-diagonal and strictly 
block-triangular parts respectively. Then the exact block-factorization of M can 
be written in the form 



M= {X - L){I - X~^U), 

where X = diag{Xi, X2, ■ ■ ■ ,Xn) and the blocks Xi are determined by the re- 
cursion 

Xi = Ci4, and X, = Q,, - i = 2 ,...,m. ( 4 ) 

It is easy to observe here that Xi are also block-circulant matrices. 

In order to compute M~^v we rewrite the block-circulant blocks of the pre- 
conditioner as 

= (F0F)yl,j(F* ®F*). 

Here 0 denotes the Kronecker product. It can be observed that for Xi we have 



X, = {F ^ F)D-\F* ^ F*) 



and the latter yields 



— Tlip, 

= Ai^i Aii—iDi—iAi—ii. 

Let A = tridiag{Ai^i-i, Ai^i, Ai^i^i). Then the following relation holds 





Mu 




V ^ 


(7 0 F 0 0 F* 0 F*)u = V. 




The above system can be rewritten as 








(F 


\ 




( All Ai2 


\ 




/F* \ 




/ui\ 




/vi\ 


F 






A21 A22 A23 






F* 




U2 




V2 


F 






A32 H33 






F* 




U3 


= 


V3 


V 


f) 




1 


^nn j 




\ 




\U„J 




\VnJ 



where F = F ® F . 
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We can distinguish three stages in computing u = M ^v: 



1) V = (7(g) F* (g) F*)v 

2) ylu = V (5) 

3) u = (7 (g F (g F)u. 

Due to the special form of F (see (1) above), we can use a fast Fourier 
transform to perform the first and third stages of the algorithm. Namely, we use 
a standard two-dimensional block-FFT which is easily parallelizable (see [6]). 
The second stage consist of solving two recurrence equations 



wi = 7?iVi 
Wi = A(Vi - 

j = 2, 3, . . . ni 



u« = w„ 

Ui = Wi - DiAi^i+iUi+i (6) 

i = ni — l,ni — 2, . . .1 



Since blocks Di and Aij in the recurrences (6) are diagonal the solution of n 2 n^ 
independent linear systems can be calculated in parallel. 



3 Parallel Complexity 



Let us present the theoretical estimate of the total execution time Tpcc for one 
PCG iteration for the proposed circulant block- factorization preconditioner on a 
parallel system with p processors (detailed analysis of parallel complexity can be 
found in [4]). Each iteration consists of one matrix vector multiplication involving 
matrix A, one multiplication involving the inverse of the preconditioner Mcbf 
(solving a system of equations with matrix M), two inner products and three 
linked triads (a vector updated by a vector multiplied by a scalar). Consequently 



^PCg(p) — Trrm/t Tprec 4“ ‘^^inn^prod 4“ ^^triads- 



For simplicity we assume that the mesh dimensions are equal and they are 
equal to an exact power of two, i.e., ni = U 2 = ns = n = 2K We also assume 
that the time to execute K arithmetic operations on one processor is Ta = 
K *ta, where ta is an average time of one arithmetic operation. In addition, the 
communication time of a transfer of K words between two neighbor processors 
is Tiocai = ts + K * tc, where tg is the start-up time and tc is the time for each 
word to be sent /received. Finally, let us assume that a 2-radix algorithm is used 
to calculate the FFT’s and thus the cost per processor is Tppt(p) = 5nlognta- 
Then the formula for computational complexity has the form 

n^ n^ 

Tpcaip) = 5 (7 -k 41ogn) — ta + 4 + 2g( — ,p) -k 2g{p,p) + 2b{p), 

p p 

where b{p) denotes time to broadcast a single value from one processor to all 
other processors and g{K,p) denotes time to gather ^ words from all processors 
into one processor. It can be shown that, for instance, when only the leading 



Parallel Performance of a 3D Elliptic Solver 539 



terms are taken into consideration, for the shared memory parallel computer the 
above function simplifies to 

1 

Tpcoip) « + 2(1 ) — tc + 5(7 + 41ogn) — ta- (7) 

P P P 

Next we analyze the relative speedup Sp and the relative efficiency Ep, where 
Tfll S 

Sp = < p and Ep = ^ < 1. Thus the formula for the speedup becomes 



Sp 



5(7 + 41ogn) 

2^|- + 2(l-i)|- + 5(7 + 41ogn)^- 



(8) 



Obviously, lim„^oo 5'p = P and lim„^oo = 1, i.e., the algorithm is asymp- 

2 , , 

totically optimal. More precisely, if logn then Ep approaches 1. 

Unfortunately, the start-up time tg is usually much larger than ta, and for rela- 
tively small n the first term of the denominators in (8) is significant, in this case 
the efficiency is much smaller than 1. 



4 Experimental Results 

In this section we report the results of the experiments executed on Sun Ultra- 
Enterprise 3000, Cray J-9x and T3E, SGI Origin 2000 computers and on two PC 
clusters. The code has been implemented in C and the parallelization has been 
facilitated using the MPI [7] library. In all cases the manufacturer provided MPI 
kernels have been used. No machine-dependent optimization has been applied 
to the code itself. Instead, in all cases, the most aggressive optimization options 
of the compiler have been turned on. Times have been collected using the MPI 
provided timer and, as verification, the clock Unix timer. Results reported by 
both timers were very close to each other. In all cases we report the best results 
from multiple runs in interactive and batch modes. In Table I we report results 
obtained on the Sun, the vector-Cray and the SGI computers for rii = U 2 = = 

64, 96, 128, 144, 160 and for the number of processors p that exactly divides the 
dimensions of the problem (a temporary limitation of the experimental code). 
The Sun has 8 processors. On the Cray J-9x and the SGI Origin we could 
effectively use only up to 16 processors. On the Cray, for larger problems, due 
to the memory limitation, we could not even use these 16 processors. We report 
time T{p), speedup Sp (calculated as time on one processor divided by the time 
on p processors), and efficiency Ep. 

A number of observations can be made. First, the proposed implementation, 
which in a natural way follows the algorithm description, is clearly not appropri- 
ate for the vector computer. To be able to achieve a respectable performance on 
the Cray a vector-oriented implementation would be necessary. Second, for small 
problems, the proposed approach parallelizes rather well on both shared memory 
(Sun) and dynamic shared (SMP) machines (SGI). However, as the problem size 
increases parallel efficiency of the Sun decreases. It can be assumed that this 
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Table 1. Parallel performance on the SUN Enterprise 3000 superserver, the 
Cray J-9x vector-computer and the SGI Origin 2000 dynamic shared memory 
parallel computer 







SUN 


Cray J-9x 


SGI 


n 


P 


np) 


Sp 


Ep 


'G(P) 


Sp 


Ep 


'G(p) 


Sp 


Ep 


64 


1 


2.39 






14.07 






0.92 








2 


1.16 


2.06 


1.03 


7.32 


1.92 


0.96 


0.46 


2.00 


1.00 




4 


0.60 


3.99 


1.00 


3.87 


3.63 


0.91 


0.23 


4.00 


1.00 




8 


0.31 


7.66 


0.96 


2.21 


6.36 


0.80 


0.12 


7.66 


0.96 




16 








1.86 


7.56 


0.47 


0.09 


9.38 


0.64 


96 


1 


18.38 






44.81 






5.56 








2 


9.02 


2.04 


1.02 


23.10 


1.93 


0.97 


2.75 


2.02 


1.01 




3 


6.08 


3.02 


1.01 


16.14 


2.77 


0.93 


1.96 


2.83 


0.95 




4 


4.68 


3.93 


0.98 


12.04 


3.72 


0.93 


1.38 


4.02 


1.01 




6 


3.19 


5.76 


0.96 


8.76 


5.11 


0.85 


0.96 


5.67 


0.97 




8 


2.90 


6.34 


0.79 


6.79 


6.59 


0.82 


0.74 


7.51 


0.94 




12 








5.38 


8.32 


0.69 


0.54 


10.29 


0.86 




16 








5.61 


7.98 


0.50 


0.47 


11.82 


0.74 


128 


1 


27.67 






130.75 






10.64 








2 


12.85 


2.15 


1.08 


69.12 


1.89 


0.95 


5.41 


1.96 


0.98 




4 


9.33 


2.97 


0.74 


35.36 


3.69 


0.92 


3.11 


3.42 


0.86 




8 


6.17 


4.49 


0.56 


20.09 


6.50 


0.81 


1.33 


8.00 


1.00 




16 








12.85 


10.17 


0.64 


0.78 


13.64 


0.85 


144 


1 


70.19 






167.55 






20.92 








2 


35.21 


1.99 


1.00 


96.23 


1.74 


0.87 


10.64 


1.96 


0.98 




3 


23.79 


2.95 


0.98 


58.32 


2.87 


0.96 


7.05 


2.96 


0.99 




4 


21.52 


3.26 


0.82 


47.37 


3.53 


0.88 


5.57 


3.75 


0.94 




6 


21.45 


3.27 


0.55 


36.55 


4.58 


0.76 


3.55 


5.89 


0.98 




8 


15.39 


4.56 


0.57 


34.76 


4.82 


0.60 


2.67 


7.83 


0.98 




12 








31.82 


5.26 


0.44 


1.84 


11.36 


0.95 




16 














1.46 


14.32 


0.90 


160 


1 


112.66 






223.03 






31.85 








2 


46.63 


2.42 


1.21 


116.43 


1.87 


0.96 


14.74 


2.16 


1.08 




4 


24.39 


4.62 


1.15 


61.60 


3.77 


0.91 


7.34 


4.33 


1.08 




5 


28.06 


4.01 


0.80 


50.96 


4.65 


0.88 


6.01 


5.29 


1.06 




8 


21.36 


5.27 


0.66 


36.48 


6.83 


0.76 


3.84 


8.29 


1.04 




10 








32.51 


8.43 


0.69 


2.99 


10.65 


1.07 




16 














2.01 


15.84 


0.99 



is due to the communication overhead which saturates the memory-processor 
pathways. In addition, the single processor performance follows the same pat- 
tern. While for n = 64 it takes the Sun twice as long to solve the problem, this 
ratio increases to almost four times longer for n = 160. This observation should 
also be related to the appearance of super-linear speedup. This effect is visible 
not only on the Sun, but also, for the largest problem, on the SGI. This effect 
has a relatively simple explanation. It has been observed many times that, on 
the RISC based hierarchical memory computers, as the problem size increases 
their efficiency rapidly decreases (see for instance [1]). 
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In Table 2 we present the results of our experiments on the Cray T3E and 
the two PC clusters: the Beowulf cluster of 16 233 MHz PII processors and the 
Scali cluster of 16 450 MHz PHI processors. The reason for this combination 



Table 2. Parallel performance on the Cray T3E and the PC clusters 







Cray T3E 


Beowulf cluster 


Scali cluster 


n 


P 


T{p) 


Sp 


Ep 


T(P) 


Sp 


Ep 


T{p) 


Sp 


Ep 


64 


1 


1.39 






2.81 






0.90 








2 


0.68 


2.04 


1.02 


1.84 


1.52 


0.76 


0.48 


1.88 


0.94 




4 


0.35 


3.97 


0.99 


1.01 


2.78 


0.70 


0.25 


3.60 


0.90 




8 


0.20 


6.95 


0.87 


0.70 


4.01 


0.50 


0.12 


7.50 


0.94 




16 


0.11 


12.63 


0.79 


0.49 


5.73 


0.36 


0.06 


15.00 


0.94 


96 


1 


7.46 






17.06 






5.34 








2 


3.74 


1.99 


1.00 


10.14 


1.68 


0.84 


2.75 


1.94 


0.96 




3 


2.54 


2.94 


0.98 


6.99 


2.44 


0.81 


1.90 


2.83 


0.94 




4 


1.90 


3.92 


0.98 


5.31 


3.21 


0.80 


1.42 


3.76 


0.93 




6 


1.31 


5.69 


0.95 


4.06 


4.20 


0.70 


0.97 


5.57 


0.93 




8 


0.98 


7.61 


0.95 


3.26 


5.23 


0.65 


0.73 


7.31 


0.91 




12 


0.67 


11.13 


0.93 


2.35 


7.25 


0.60 


0.49 


10.96 


0.91 




16 


0.52 


14.34 


0.90 


1.98 


8.61 


0.54 


0.37 


14.43 


0.89 



is that the Cray in the NERSC center has only 256 Mbytes of memory per 
processor (which is exactly the same amount of memory as we had per node in 
both clusters) and thus we were able to run on them only the smaller problems. 
In addition, all three machines represent pure message passing environments. 

The results are rather surprising. The Cray is only 3-4 times faster that the 
233 MHz PH cluster and slower than the 450 MHz PHI cluster. It should be 
also added here, that the code on the Beowulf was compiled using the GNU 
compiler, while the code on the Scali cluster was compiled using the Portland 
Group compiler and thus the Beowulf results could have been somewhat better 
if the better quality compiler was used. Observe also that for n = 96 the Beowulf 
cluster has a performance comparable to the Sun (see Table 1). Interestingly, the 
Scali cluster slightly outperforms the SGI supercomputer. It is a pity that the 
distributed memory machines did not have more memory per node as it would 
be very interesting to find out if this relationship holds also for larger problems. 



5 Concluding Remarks 

In this note we have reported on the parallel performance of a new precondi- 
tioner applied to the conjugate gradient method used to solve a sparse linear 
system arising from a 3D elliptic model problem. We have shown that the code 
parallelizes well on a number of machines representing shared memory, dynamic 
shared memory (SMP) and message passing environments. In the near future 
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we plan, first, to complete the performance studies by running our code on a 
number of additional machines (e.g. IBM SP2, HP SPP 1000, Compaq Alpha 
Cluster etc.). Second, we will extend our work to non-uniformly shaped domains, 
non-uniform discretizations as well as situations when the proposed approach is 
embedded in a solver for non-linear problems. 
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Schwarz Methods for Convection-Diffusion 

Problems* 



H. MacMullen^, E. O’Riordan^, and G. I. Shishkin^ 
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Dublin, Ireland 
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Abstract. Various variants of Schwarz methods for a singularly per- 
turbed two dimensional stationary convection-diffusion problem are con- 
structed and analysed. The iteration counts, the errors in the discrete 
solutions and the convergence behaviour of the numerical solutions are 
analysed in terms of their dependence on the singular perturbation pa- 
rameter of the Schwarz methods. Conditions for the methods to converge 
parameter uniformly and for the number of iterations to be independent 
of the perturbation parameter are discussed. 



1 Introduction 

Consider the following two dimensional convection-diffusion problem 

— eAug + a ■ \/ue = f on 17 = (0, 1) x (0, 1), (la) 

u = g on dn, (lb) 

a = (oi, 02 ) > (0, 0) on 17, (Ic) 

where a, f and g are sufficiently smooth and / and g are sufficiently compatible 
at the four corners. 

We wish to examine the suitability or otherwise of various Schwarz domain 
decomposition methods for this problem. It is well known that if one uses a 
monotone finite difference operator on an appropriately fitted piecewise-uniform 
mesh [2], the piecewise bilinear interpolant of the discrete solution satisfies the 
error bound \\ug — U^\\ < CN~^ In N, where C is a constant independent of 
e. When an iterative numerical method is employed then both the disretization 
error and the iteration counts should be examined as functions of the small 
parameter s. The number of mesh intervals in any coordinate direction is denoted 
by N and k denotes the iteration counter. Our goal is to design an iterative 
numerical method for (1) that satisfies an estimate of the form: ||u£ — U^’ || < 

CN~P + Cq^, T > 0, q < 1, where C,p and q are independent of e and N. 

* This research was supported in part by the DCU Research Grant RC98-SPRJ- 
12EOR, by the Enterprise Ireland grant SC-98-612 and by the Russian Foundation 
for Basic Research under grant No. 98-01-00362. 
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2 One Dimensional Convection-Diffusion 

We begin by examining Schwarz methods for the one dimensional convection- 
diffusion problem 



—eu” + a{x)u'^ = /, X G 17 = (0, 1), (2a) 

we(0) = A, u,(l) = B. (2b) 

where a{x) > a > 0. The solution u"e \ of an overlapping continuous Schwarz 
method, described in [9], where solution domain, 17, is partitioned into two 
overlapping subdomains, 

■^o = (0,C^) and 17i = (^",l), 

satisfies the following error estimate. This is a well known result (see, for exam- 
ple [1,8,7,3]). 

Lemma 1. [9] Let he the solution of (2) and let he the sequence 

of Schwarz iterates. Then, for all k >1, 

where C is independent of k and e and 

< 1 . 

When the constants = 1 — r and = 1 — 2r are chosen using the Shishkin 
transition point r, 

r = min{l/3, — InA^}, (3) 

a 

the solution of this method converges to the exact solution Us, independently 
of both e and N . However, in [6] numerical computations are presented which 
demonstrate that the discrete analogue of this method, in which uniform meshes 
discretise both subdomains, does not produce e-uniform convergent approxima- 
tions [5]. An alternative overlapping Schwarz method using a special piecewise 
uniform mesh in the layer subdomain was proposed in [6] , and theoretical anal- 
ysis showed this method to be e-uniform. However, the width of the overlapping 
region was and so the iteration counts became large as the number of 

grid nodes increased. Also the analysis of this method was cumbersome. There- 
fore, we examined less complicated non-overlapping methods using uniform grids, 
with the intention of developing and analysing a two dimensional Schwarz ap- 
proach. We now describe two such methods and present theoretical convergence 
results. 
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2.1 A Non-overlapping Schwarz Method with Dirichlet Boundary 
Conditions 

The solution domain is partitioned into two non-overlapping subdomains; 

•^0 = (0,C^), where = 1 - r, 

and T is given by (3). We also define — t) to be the {N — 1)*^ grid 

node on l?o- The method is now formally described as follows. 

Method 1. The exact solution is approximated by the limit of a sequence 
of discrete Schwarz iterates which are defined as follows. For each 

k>l, 

Uq\x), X G ^0, 

where is the linear interpolant of Let he a uniform 

mesh on with Xi = iff^ /N and = {x^}^ be a uniform mesh on fli 

with Xi = -I- i(l — /N. Then for k = 1 

LfcW = / in<, C/W(0)=^r,(0), C/W(e+)=0, 

LfC/W=/ = C/W(l)=u,(l), 




and for k > 1 

in<, C/W(0)=w,(0), C/W(e+) = C/f"'’(e+), 
Lft/f'=/ in<, C/f!(^+) = C/W(r), [/'">(!) = «e(l). 



The simple Dirichlet interface conditions on l7i, mean that 

the error reduction attained at x = when the method is applied in Hq, is 
transferred to l7i but since no values are passed from l7i to Dq, the problem of 
an accumulating error, see [5], in the iteration process is avoided. The following 
theorem gives error estimates for this method. 

Theorem 1. [5] Assume r = ^In A < 1/3. For all k>l, 

k-l 

- Well < C'A-^(lnA)2-HCA-^^A-J-hC'A-'= 

1=1 

< CN-\lnNf + Ce + C\-’^ 

where C is a constant independent of k, N and e and A = 1 -|- . 

Consequently, this method is first order convergent for e < However, at 

each iteration the error reduction, A”^, is proportional to 1 — N~^ and so, the 
iterations become large for large N. 
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2.2 A Non-overlapping, Non-iterative Schwarz Method 

In this section we discuss a Schwarz method which uses the same domain struc- 
ture as described in the previous section and applies the Neumann interface con- 
dition, D~Uo{^^) = 0 on Dq . Therefore, no solution values are passed between 
subdomains and the method does not iterate. This approach uses a minimum 
amount of information and yet, for a singularly perturbed problem, produces 
accurate approximations for small e. The method is defined as follows. 

Method 2. The exact solution Ug is approximated by Ug which is defined as 
follows 

, f Uo{x), X G Do, 

Ugix) = < _ 

[ 17i(x), X G I7i 

where Ui is the linear interpolant of Ui. Let Dq = be a uniform mesh 

on I7n with x, = and D^ = fxAn be a piecewise uniform mesh on 

with Xi=£,+ + i{l - C+)/iV. Then 

L^Uo = f in<, Uo{0) = Ug{0), D-Uo(t) = 0, 

L^Ui = f in<, f/i(e+) = f/o(e+), t/o(l) = u,(l), 



The following theorem states the convergence behaviour for this approach. 
Theorem 2. [5] Assume t < 1/3. Then 

\\Ug-Ug\\ < CN~^{\nNY + Ce 
where C is a constant independent of N and e. 

We remark that numerical experiments have been carried out in [5] which demon- 
strate that the numerical approximations produced by this method are equivalent 
to those from Method 1. 

Finally, Methods 1 and 2, although efficient for small values of e, fail to be con- 
vergent for e > N~^. This is due to the non-matching of the interface conditions 
to the true solution for large values of e. Numerical experiments are presented 
in [5] for a non-overlapping Schwarz method which uses the interface condition, 

7?+c/W(r) = = ut\e)- 

These computations show the numerical solutions converge for larger values of 
£ than solutions of Methods 1 and 2. However, an important observation is 
that the iterations required by this method are proportional to N and so the 
computational cost grows with the number of grid nodes required. 

Remark 1. Recently, there has been a considerable interest in examining various 
different types of interface conditions (e.g., Robin-Robin or Dirichlet-Robin ) for 
singularly perturbed convection-diffusion problems (see, for example, [10,4,11] 
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and the references therein). However, in many cases, the analysis is restricted 
to the continuous Schwarz algorithm and/or the errors are examined in an L 2 - 
norm, as opposed to a pointwise norm. Note that the L 2 ~iiorm is an inappropri- 
ate norm for singularly perturbed problems [2]. In this paper, we determine the 
explicit dependence of the pointwise errors and the iteration counts on both e 
and N for two discrete Schwarz algorithms. These methods are not optimal, but 
are of theoretical interest. We expose the explicit error bounds that highlight 
the difference between the discrete and the continuous Schwarz algorithms and 
display the intricate nature of how the pointwise discretization error depends on 
the three variables e, k and N. 



3 Two Dimensional Convection-Diffusion 



We extend Method 1 to the two dimensional problem (1). The solution domain 
17 = (0,1)^ is partitioned into four non-overlapping subdomains I7a, ^b, and 
ild defined by 

f2a = (0, 1 - Ti) X (0, 1 - T 2 ), I7f, = (1 - Ti, 1) X (0, 1 - T 2 ), 

^2c = (0, 1 - Ti) X (1 - T2, 1), I7d = (1 - Ti, 1) X (1 - T2,1). 

where the transition parameters ti, T 2 are chosen to satisfy for i = 1 and i = 2, 

1 £ 

Ti=min{-, — IniV}. 

2 a,; 



N—l /I _ \ 

TV (1 ^ 2 )- 



We use the notation, = 1 — n, = 1 — T 2 , — ti) and ^2 = 

,0 _ fAAi 



Method 3. For each k > 1, Ue (x^y) = Ul (x, y), (x, y) € fii, i = a,b^c,d 

where is the bilinear interpolant of . Let = {xt,yj} be a uniform 
mesh on S2i . On 17^ \ T) , C/|^^ = g V fc > 1 . Then for k = 1 



,N 
a 1 



Lft/W=/ 
Lft/W=/ 
Tft/W=/ znn^, 



U^a\xi,yj) = <F on Fa 

= F, c/W(e+ y,) = t/W(er,%) 

Lft/W=/ = = 

Then for k > 1, 

Lft/W=/ mi7f, ul^\xi,f+) = ult-^\xi,f+) 

Lft/f =/ ^n^2^, Ul^\xi,f+) = ut^\xi,f+), t/f(ei+,%) = CW(er,%) 
Lft/w = / ui>^\ft,yj) = ut"\^t.yj). UV^Hx.,^+) = Ul^\x.,ff) 

Lft/f = / m 
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where W is some arbitrary function with sufficient smoothness. 

The following convergence result for the numerical solution generated is derived 
in [5], using comparison principle arguments and appropriate estimates on the 
solution components. 

Theorem 3. Assume Ti < 1/3. For k>l, 

- ttsll < CN~\lnNf + + Ce 

where = 1 -|- f = 1, 2 and C is a constant independent of k,N and e. 

This theorem reveals a natural extension of Method 1 to two dimensions. 

Remark 2. A parameter uniform overlapping method can be designed in which 
the overlapping regions would be fixed independently of e and N, and a Shishkin 
fitted mesh placed in the subdomains containing layer regions. For the problem 
class 1, this Schwarz method would have no advantages over the fitted non- 
iterative Shishkin mesh. However, a problem involving a complex domain struc- 
ture in higher dimensions, in which a fitted mesh may not be viable, may require 
this type of Schwarz approach. 



3.1 Numerical Results 

Numerical computations are carried out on the following problem for a sequences 
of meshes 12^^, i = a,b,c,d corresponding to iV = 4, 8, 16, 32, 64, 128 : 

eAui; -I- (2 -I- x^y)ux + (1 + xy)uy = + y^ + cos(x + 2y), (4a) 

with boundary conditions 



u(x, 0) = 0, u(x, 1) 

u(0,y) = 0, w(l, 2 /) 



|4x(l-x), 

|8(2/- 2y2), 



X < 1/2, 
X > 1/2, 

X < 1/4, 
X > 1/4. 



(4b) 

(4c) 



In Figure 1 the numerical solution U/®, with N = 16 intervals in each subdomain 
and £ = 0.001, is shown. The orders of convergence presented in Table 1 are 
computed using the double mesh principle (see [2]), 



P 



N 

e 




where = max \U^ (xi) — {xi)\. 

xienf 



In Table 2, the required iteration counts are given for a tolerance level of 
max \Ul^\xi) - C/f"^'(xi)| < 10”®. 

These results show experimentally that, for small values of e, this method pro- 
duces accurate numerical approximations and is computationally efficient. 
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Table 1. Computed orders of convergence for Method 3 applied to Problem 
(4) for various values of e: and iV, where N is the number of intervals in each 
subdomain 



N 

£ 3 8 IB 3T 

2““ 0.08 0.24 0.45 0.57 
2“’’ 0.09 0.28 0.52 0.63 
2“® 0.08 0.31 0.59 0.73 
2 ~^ 0.06 0.32 0.63 0.80 
2~^“ 0.05 0.31 0.65 0.84 
2^“ 0.05 0.31 0.65 0.85 
2"i^ 0.04 0.31 0.65 0.85 



2~^'* 0.04 0.31 0.65 0.85 



Table 2. Iteration count for Method 3 applied to Problem (4) for various values 
of e and iV, where N is the number of intervals in each subdomain 



N 

e 4 8 16 32 64 128 



2-0 


7 9 


12 


18 28 


46 


2-^ 


6 8 


10 


13 


18 


28 


2-8 


6 7 


8 


10 


13 


18 


2-9 


5 6 


7 


8 


10 


13 


2-10 


5 5 


6 


7 


8 


10 


2-“ 


4 5 


5 


6 


7 


8 


2-12 


4 4 


5 


5 


6 


7 


2-13 


4 4 


4 


5 


5 


6 


2-1* 


4 4 


4 


4 


5 


5 


2-15 


4 5 


4 


4 


4 


5 


2-16 


3 4 


6 


4 


4 


4 


2 - 1 " 


3 3 


4 


6 


11 


22 


2-18 


3 3 


3 


4 


6 


10 


2-19 


3 3 


3 


3 


4 


5 




Fig. 1. Numerical solution of problem 4 with and e: = 0.001 
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Abstract. The convergence of Monte Carlo method for numerical in- 
tegration can often be improved by replacing pseudorandom numbers 
(PRNs) with more uniformly distributed numbers known as quasiran- 
dom numbers(QRNs). Standard Monte Carlo methods use pseudoran- 
dom sequences and provide a convergence rate of using N 

samples. Quasi-Monte Carlo methods use quasirandom sequences with 
the resulting convergence rate for numerical integration as good as 
0{{logN)^)N-^). 

In this paper we study the possibility of using QRNs for computing 
matrix- vector products, solving systems of linear algebraic equations and 
calculating the extreme eigenvalues of matrices. Several algorithms using 
the same Markov chains with different random variables are described. 
We have shown, theoretically and through numerical tests, that the use of 
quasirandom sequences improves both the magnitude of the error and the 
convergence rate of the corresponding Monte Carlo methods. Numerical 
tests are performed on sparse matrices using PRNs and Sobol, Halton, 
and Faure QRNs. 



1 Introduction 

Monte Carlo methods (MCMs) are based on the simulation of stochastic pro- 
cesses whose expected values are equal to computationally interesting quantities. 
Despite the universality of MCMs, a serious drawback is their slow convergence, 
which is based on the behavior of the size of statistical sampling 

errors. This represents a great opportunity for researchers in computational sci- 
ence. Even modest improvements in the MCM can have substantial impact on 
the efficiency and range of applicability for MCM. Much of the effort in the 
development of Monte Carlo methods has been in construction of variance re- 
duction methods which speed up the computation by reducing the constant in 
front of the An alternative approach to acceleration is to change the 

choice of sequence and hence improve the behavior with N. Quasi-Monte Carlo 
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methods (QMCMs) use quasirandom (also known as low-discrepancy) sequences 
instead of pseudorandom sequences. 

QRNs are constructed to minimize a measure of their deviation from uniformity 
called discrepancy. There are many different discrepancies, but let us consider 
the most common, the star discrepancy. Let us define the star discrepancy of a 
one-dimensional point set, {xn}n=i^ by 



D* = . . ,Xiv) 



1 ^ 

llU y^X[o,ti)(a;n) - w| 

0<n<l -iV “ 



n—1 



( 1 ) 



where X[o,u) is the characteristic function of the half open interval [0,u). The 
mathematical motivation for quasirandom numbers can be found in the classic 
Monte Carlo application of numerical integration. We detail this for the trivial 
example of one-dimensional integration for illustrative simplicity. 

Theorem (Koksma-Hlawka, [6]): if f{x) has bounded variation, V{f), on [0,1), 
and xi, . . . , Xat S [0, 1] have star discrepancy D*, then: 

I ^ ^ iV/(x„) - r fix) dx\ < Vif)D\ (2) 

n=l -^0 



The star discrepancy of a point set of N truly random numbers in one dimen- 
sion is 0(A^“^/^(loglogiV)^/^), while the discrepancy of N quasirandom num- 
bers can be as low as N~^. ^ In s > 3 dimensions it is rigorously known that 
the discrepancy of a point set with N elements can be no smaller than a con- 
stant depending only on s times iV“^(log This remarkable result of 

Roth, [10], has motivated mathematicians to seek point sets and sequences with 
discrepancies as close to this lower bound as possible. Since Roth’s remarkable 
results, there have been many constructions of low discrepancy point sets that 
have achieved star discrepancies as small as 0{N~^ {log Most notably 

there are the constructions of Hammersley, Halton, [.5], Sobol, [11], Faure, [3], 
and Niederreiter, [9]. 

While QRNs do improve the convergence of applications like numerical integra- 
tion, it is by no means trivial to enhance the convergence of all MCMs. In fact, 
even with numerical integration, enhanced convergence is by no means assured 
in all situations with the naive use of quasirandom numbers, [1,8]. 

In this paper we study the applicability of quasirandom sequences for solving 
some linear algebra problems. We have already produced encouraging theoretical 
and empirical results with QMCMs for linear algebra problems and we believe 
that this initial work can be improved. 



Solving Systems of Linear Algebraic Equations via Neumann Series 

Assume that a system of linear algebraic equations (SLAB) can be transformed 
into the following form: x = Ax + where A is a real square, n x n, matrix, 

^ Of course, the N optimal quasirandom points in [0, 1) are the obvious: 

1 2 N 

(V+l) ’ (JV+1) ’ ■ ■ ■ (V+1) ■ 
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X = {xi,X 2 , ■■■,x„Y is the Ixn solution vector and (p = ■■■,PnY is the 

given right-hand side vector.^ In addition, assume that A satisfies either the 
condition maxi<i<„ Wij\ < 1; or, that all the eigenvalues of A lie within 

the unit circle. 

Now consider the sequence . . . defined by the following recursion: 

^(fc) = fc = l,2,.... 

Given initial vector x^^\ the approximate solution to the system x = Ax + p 
can be developed via a truncated Neumann series: 

x^^'^ = p + Ap + A?' tp + . . . + tp + A^ x'^^\ k>0 (3) 

with a truncation error of x^^^ — x = A^{x^^'> — x). 

This iterative process (3) of applying the matrix A repeatedly is the basis for 
deriving a Monte Carlo approach for this problem. 

The Monte Carlo Method 

Consider the problem of evaluating the inner product of a given vector, g, with 
the vector solution of the considered system 

(g,x) = X^2=l9aXa- ( 4 ) 

To solve this problem via a MCM (see, for example, [12]) one has to construct 
a random process with mean equal to the solution of the desired problem. This 
requires the construction of a finite-state Markov chain. Consider the following 
Markov chain: 

ko ^ ki ^ ^ ki, (5) 

where kj = 1,2, ... ,n for j = 1, ... ,i are natural numbers. The rules for con- 
structing the chain (5) are: P(/co = a) = Pa,P{kj = P\kj-i = a) = Pap 
where Pa is the probability that the chain starts in state a and Pap is the 
transition probability from state a to state P . Probabilities Pap define a tran- 
sition matrix P. We require that J2a=iPa = 1 j Yl'p=iPap = 1 for any 
a = 1, 2, ..., n, and that the distribution {p\, ...,p„)* is permissible to the vector g 
and similarly the distribution p^p is permissible to A [12]. Common construc- 
tions are to choose Pap = , for a, P = l,2,...,n, which corresponds to 

/ ,a l“a/3l 

an importance sampling MCM (MCM with a reduced variance), or to choose 
Pap = 1/n for a,P = 1, 2, ..., n which corresponds to standard MCM. 

Now define the random variables 0[g]\ 

%] 

where ITo = 1 , Wj = Wj-\ 

^ If we consider a given system Lx = h, then it is possible to find a non-singular 
matrix, M, such that ML = I — A and Mb = p. Thus without loss of generality the 
system Lx = h can always be recast as x = Ax -\- p. 
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It is known [12] that the mathematical expectation E']^]^]] of the random variable 
e[g\ is: 

E[0[9\\ = {g,x). 

The partial sum corresponding to (6) is defined as 9i[g] = ^ ■ Thus 

the Monte Carlo estimate for {g,x) is (g,x) « where N is the 

number of chains and 0i[g]s is the value of 0i[g] taken over the s-th chain, and a 
statistical error of size O(Var{0iy^^ 

Computing the Extremal Eigenvalues 

Let A be an n X n large, sparse, matrix. Consider the problem of computing 
one or more eigenvalues of A, i.e., the values of A for which Au = Xu holds. 
Suppose the eigenvalues are ordered |Ai| > IA 2 I > ... > |A„_i| > |A„|. There 
are two deterministic numerical methods that can efficiently compute only the 
extremal eigenvalues - the power method and Lanczos-type methods. (Note that, 
the Lanczos method is applicable to only symmetric eigenproblems, [4]. ) 
Computational Complexity: If k iterations are required for convergence, the num- 
ber of arithmethic operations is 0{kn^) for the power method and 0{n^ + kn^) 
for both the inverse and inverse shifted power method. 



The Monte Carlo Method 

Consider MCMs based on the power method. When computing eigenvalues, we 
work with the matrix A and its resolvent matrix Rg = [I — qA]~^ € If 

jgAj < 1, Rg may be expanded as a series via the binomial theorem: 



[J-gAl]— = |gA|<l. 



2 = 1 



( 7 ) 



The eigenvalues of the matrices Rg and A are connected by the equality p, = 
and the eigenvectors of the two matrices coincide^. Let / G FC, h G ffC. 
Applying the power method, ([2]), leads to the following iterative processes: 



A(m) = 



jh,A^f) 

{h, f) m^oo 



An 






{[I-qA]--^f,h) 



^ f^max — X • 

) m— *^00 1 — qA 



(8) 



( 9 ) 



([J-gA]-(™-i)/, h) 

Construct the same Markov chain as before with the initial density vector, p = 
{Pa}a=l^ the transition density matrix, P = Define the following 



If g > 0, the largest eigenvalue pmax of the resolvent matrix corresponds to the 
largest eigenvalue, Xmax, of the matrix A, but if g < 0, then pmax, corresponds to 
the smallest eigenvalue, Xmin, of the matrix A. 
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random variable: Wq = — , Wj = Wj-i , j = This has the 

desired expected values ([2]): 



E[Y1 q^Cl+m-iWJ{x,)] = {h, [I - gA]— /), m = 1, 2, . . . , 

i=0 

and allows us to estimate the desired eigenvalues as: 



and 



. 1 A _ ^ E[j:Z,q^-^Cl-l_,WJix.)] 

E[z:Zoq^czm-iW^fix.)] ' 



( 10 ) 



( 11 ) 



We remark that in (10) the length of the Markov chain, I, is equal to the number 
of iterations, i, in the power method. However in (11) the length of the Markov 
chain is equal to the number of terms in truncated series for the resolvent matrix. 
In this second case the parameter m corresponds to the number of iterations. 



Table 1. Monte Carlo estimations using PRNs and QRN sequences for the 
dominant eigenvalue of matrices of size 128 and 2000 





PRN 


QRN{Faur) 


QRN{Sobol) 


QRN{Halton) 


Estimated 

A1287Tiaa; 


61.2851 


63.0789 


63.5916 


65.1777 


Relative 

Error 


0.0424 


0.0143 


0.0063 


0.0184 


Estimated 

X2Q0Qma. 


58.8838 


62.7721 


65.2831 


65.377 


Relative 

Error 


0.0799 


0.01918 


0.0200 


0.0215 



Quasi-Monte Carlo Methods for Matrix Computations 

Recall that power method iterations are based on computing A^f (see (8) and 
(9)). Even if we are interested in evaluating the inner product (4), substituting x 
with x^^^ from (3) will give (g, x) ~ Aip + g^A^ip + . . . + g"^ ip + 

g'^A^x^°\ fc > 0. Define the sets G = [0, n) and Gi = [i— 1, i), f = 1, . . . , n, and 
likewize define the piecewise continous functions f{x) = fi, x S G^, f = 1, . . . , n, 
a{x,y) = ttij,x G Gi,y G Gj, i,j = l,...,n and h(x) = hi, x G Gi,i = 
1, . . . ,n. 
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Computing h'^ f is equivalent to computing an (i + l)-dimensional integral. 
Thus we may analyze using QRNs in this case with bounds from numerical 
integration. We do not know explicitly, but we do know A and can use a 
random walk on the elements of the matrix to compute approximately hA' A^ f . 



Relative Errors in Computing hWf 



(for sparse matrix 2000 x 2000) 




Fig. 1. Relative errors in computing h?" A^ f for k = 1,2,..., 10 for a sparse 
matrix 2000 x 2000. The corresponding Markov chains are realized using PRN, 
Sobol and Halton sequences 



Consider and an {i + l)-dimensional QRN sequence. Normalizing A with 

i, and h and / with we have the following error bound (for proof see [7]): 



I^nA'^nIn- ^'^h{xs)a{xs,ys) ■ ■ ■a{zs,Ws)f{ws)\ < \h\'^\A\’-\f\D%. 

If A is a general sparse matrix with d nonzero elements per row, and d n, 
then importance sampling method can be used; the normalizing factors in the 
error bound (3) are then 1/d for the matrix and —k — for the vectors. 

vl") 



2 Numerical Results 

Why are we interested in quasi-MCMs for the eigenvalue problem? Because the 
computational complexity of QMCMs is bounded by 0{lN) where N is the num- 
ber of chains, and I is the mathematical expectation of the length of the Markov 
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chains, both of which are independent of matrix size n. This makes QMCMs 
very efficient for large, sparse, problems, for which deterministic methods are 
not computationally efficient. 



Relative Error versus Length of Markov Chain 



(matrix of order 1024) 




Fig. 2. Relative errors in computing Xmax using different length of Markov chains 
for a sparse matrix 1024 x 1024. The random walks are realized using PRN, 
Faure, Sobol and Halton sequences 



Numerical tests were performed on general sparse matrices using PRNs and 
Sobol, Halton and Faure QRNs. The relative errors in computing A'^f with A 
a sparse matrix of order 2000 and h = /=(l,l,...,l), are presented in Figure 1. 
The results confirm that the QRNs produce higher precision results than PRNs. 
The more important fact is the smoothness of the quasirandom ’’iterations” 
with k. This is important because these eigenvalue algorithms compute a Raleigh 
quotient which requires the division of values from consecutive iterations. 

The estimated Xmax and the corresponding relative errors using MCM and 
QMCM are presented in Table 1. The exact value of Xmax for all test matrices 
is 64.0000153. The results show improvement of the accuracy. Numerical exper- 
iments using resolvent MCM and resolvent QMCM have been also performed - 
the relative errors in computing Xmax using Markov chains with different lengths 
are presented in Figures 2 and 3. 
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Relative Error versus Length of Markov Chain 



{matrix of order 2000) 




Fig. 3. Relative errors in computing Xmax using different length of Markov chains 
for a sparse matrix 2000 x 2000. The random walks are realized using PRN, 
Faure, Sobol and Halton sequences 
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Abstract. The generalized Schur algorithm (GSA) is a fast method to 
compute the Gholesky factorization of a wide variety of structured ma- 
trices. The stability property of the GSA depends on the way it is imple- 
mented. In [15] GSA was shown to be as stable as the Schur algorithm, 
provided one hyperbolic rotation in factored form [3] is performed at 
each iteration. Fast and efficient algorithms for solving Structured Total 
Least Squares problems [14,15] are based on a particular implementa- 
tion of GSA requiring two hyperbolic transformations at each iteration. 
In this paper the authors prove the stability property of such implemen- 
tation provided the hyperbolic transformations are performed in factored 
form [3]. 



1 Introduction 

The generalized Schur algorithm (GSA) is a fast method to compute the Gholesky 
decomposition of a wide variety of symmetric positive definite structured matri- 
ces, i.e., block-Toeplitz and Toeplitz-block matrices, matrices of the form T^T, 
where T is a rectangular Toeplitz matrix [9,7] and to compute the LDL^ fac- 
torization of strongly regular [1] structured matrices, where L is a triangular 
matrix and D = diag(±l, . . . , ±1). The stability property of the GSA depends 
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on the way it is implemented [11,5]. In [15] the GSA was shown to be stable, pro- 
vided one hyperbolic rotation in factored form [3] is performed at each iteration. 
Similar results were obtained in [5] , using the OD procedure or the H procedure 
instead of using the hyperbolic rotations in factored form. The computational 
complexity of GSA is 0{aN^), where N is the order of the involved matrix and 
a is its displacement rank (see §2). 

The Structured Total Least Squares problem, as described in [14], solves 
overdetermined Toeplitz systems of equations with unstructured right-hand side 
and can be formulated as follows: 

.min Ab]\\% 

AA,Ab,x 

such that {A + AA)x = b + Ab, A, AA G n, 

with A,AA Toeplitz matrices. The kernel of the algorithm proposed in [14], 
is the solution of a least squares problem, where the coefficient matrix is a 
rectangular Toeplitz-block matrix, with dimensions (2m + n— 1) x (m-|-2n— 1). 
The sparsity structure of the corresponding generators is such that when using 
at each iteration two hyperbolic rotations rather than one, the complexity of 
the GSA can be reduced from 0{a{m + n)^) to 0{amn). It is therefore worth 
studying the stability of such a modification of the GSA, which is what we do in 
this paper. Although the sparsity of the generators is exploited in this modified 
GSA, we need not keep track of this zero pattern in order to perform the error 
analysis. The paper is organized as follows. In §2 we describe the implementation 
of the GSA when using two hyperbolic rotations; we analyze its stability property 
in §3 and we terminate with some conclusions in §4. 



2 The Generalized Schur Algorithm 



In this section we introduce the GSA to compute the R factorization of a 
symmetric positive definite matrix A, where R is an upper triangular matrix, 
using two hyperbolic rotations at each iteration. 

Given an nx n symmetric positive definite matrix A, define Da = A— ZAZ^ . 
We say that the displacement rank of A with respect to Z is a if rank(T>^) = a, 
where Z is the lower triangular (block) shift matrix of order n (for a more general 
choice of the matrix Z, see [9,6]). Glearly Da will have a decomposition of the 
form Da = G^JaG, where 



where u 



U. 



( 1 ) 
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order k. The pair (G, Ja), G € is said to be a generator pair for A [12]. A 

matrix 0 is said J/i-orthogonal if Ja0 = Ja- 

The GSA requires n iterations to compute the factor R. Let Gq,z = G. At 
the Ah iteration, i = 1, . . . ,n, a J^i-orthogonal matrix 0i is chosen such that 
the Ah column of Gi = OiGi-i^z has all the elements equal to zero with the 
exception of a single pivot element in the first row (the first i — 1 columns of Gi 
are zero). The generator matrix Gi is said to be in a proper form. Then the first 
row of Gi becomes the Ah row of R. The generator matrix Gi^z at the next 
iteration is given by 

G,,z(l, :) = G,(l, :)Z^, G,,z([2 : a], :) = G,([2 : a], :). 



Without loss of generality, the matrices &i, i = 1, . . . ,n, can be factored as the 
product of two hyperbolic rotations and an orthogonal one, i.e.. 



0i = Hi,iHi^ 2 Qi, where Qi = 



Qi,l 






Qi,2 



with Qi i and Qi ^2 orthogonal matrices of order p and q, such that 



QiGi-i^z = 
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where Or,s denotes the rectangular null matrix with r rows and s columns. As 
mentioned in §1, the computation of the hyperbolic rotation in a stable way is 
crucial for the stability of the algorithm. For implementation details of hyperbolic 
rotations in factored form see [3,15]. In the next section we will show that this 
variant of the GSA is stable, provided in each iteration the JA”Orthogonal ma- 
trix is computed as previously described, and the hyperbolic rotations are imple- 
mented in factored form. Similar stability results hold considering either the H~ 
procedure or the OH-procedure to implement the hyperbolic rotations [5,12]. 



3 Stability Analysis 

A stability analysis of the GSA with a single hyperbolic rotation in factored 
form per iteration is presented in [15]. The stability analysis for the algorithm 
described in the previous section can be done in a similar way. It is split up into 
two parts : one which shows how local errors propagate through the algorithm 
and one which bounds the local errors. We consider the same notation as intro- 
duced in §2 but denote by the superscript the corresponding quantities as stored 
in the computer. Hence Gi = \ui Uf Vi zf Vi] . The local errors, generated 
by computing G^+i by means of orthogonal and hyperbolic transformations, are 
given by 



eFi — GJ_^_lJAGi^l — GfzJAGi^z + G(e^), * — 1, . . . , n, (2) 

where e is the machine precision. In [15] is proved that 

n—1 n—1 n—j—1 

A- R^R=Y,Zj{G^JaG-G^JaG)zJ - eJ2 £ ZjFkZj + 0{e^), (3) 

j=0 j=0 k=l 

where Zj = Z^ and i? is the computed Gholesky factor. This means that if the 
errors in the computation of the initial generator matrix and the local errors are 
bounded, the algorithm is stable. The error in the initial generator matrix is not 
a problem, since often it is explicitly known or can be computed in a backward 
stable way [8]. Below, we assume that the initial generator matrix is computed 
exactly and restrict ourselves to the effects of local errors due to the orthogonal 
and hyperbolic transformations. 

Because any bounds on the errors produced by the transformations will de- 
pend on the norm of the generators, it is essential to bound the generators. 

Theorem 1. When the generators are computed by applying a block diagonal 
orthogonal matrix and two hyperbolic transformations, they satisfy 



||Gill_F < 2V i — 1]] AjjF -I- 1|G]1 _f 



(4) 
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Proof. Let Ui, Vi and Zi be the generator vectors in (1) that will be modified by 
the two hyperbolic rotations Hi 2 and Hi i, 
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Then 
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Applying the inequality 
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Since the orthogonal transformations do not affect the norm of the generators 
and \\Z \\2 = 1, then ||Gi|||- < 2||Mi|p + and recursively we have 



||Gi|||. < 2 ^ \\uj\\l + ||G|||. — 2||i?(l : fe, :)|||- + ||G|||.. 
i=i 



Then (4) follows since, for an arbitrary positive semi-definite, rank i — I matrix 
with a factorization A = R, (see [15]), ||i?|||^ < •\/i||A|||, □ 



To complete the stability analysis we need to show that the orthogonal and 
hyperbolic transformations, applied in factored form, produce a local error, eFi, 
which is proportional to the norm of the generator matrix. An error analysis of 
hyperbolic transformations applied in factored form is given in [3]. Denoted by 
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where e is the roundoff unit. Furthermore, concerning the application of the 
orthogonal transformations, it can be proved [15,16] that there exist orthogonal 
matrices Qi^i and Qi ^2 such that 
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Letting AGi = [Aui Z\C/J Avi Azi AV.^]'^, then ||ziGi||F < 6 me||Gi_ 2 ||_F < 
6 TOe||Gi||_F. Analogously, letting AGi = [Au Av Az\^ , then the error bounds (5) 
and (7) can be used to show that 

Gf JaG, = {G,,z + AG,fjA{G,^z + AG,), 

(Gi+i + eiAu )'^ JA{Gi+i + eiAu ) = {Gf + ep+iZ\z; + ep+2Az )^ Ja 

... T . — - T 

y.{Gj + Cp+iAv + epj,-2Az ), 



where ei,ep+i and Cp +2 are standard basis vectors. Then 



eFi — Gf^^JAAGi + AGf JaGi^z 



[iipi+i Diy zi,i]AGi - Z\Gj 



- -j, 

«M +1 



-IS 



corresponding to the bound 



llei^.llf < 2 ||G,,z||^||Z\G,||f + 2 (||G,||f + ||G,+i||f) || AG,||f 

< 12me||G,,z||^ + 25e(||G,||f +11 Gi+lIlF^ 

< 127ne||Gi|||i + 25e(||Gi||F + IIGi+ijlF)^ + 0{t^). 

By Theorem 1, ||Gi|||-, ||Gi+i|||. < 2-\/i||A||F + ||G|||., the following bound 

holds 

||eF,||F < (12m + 100)e (2Vi|| A ||f + ||G|||) . 

From (3), we have || A — R'^R\\f < (6m + 50)(n — l)ne (2-yn|| A ||f + ||G|||,) . 
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4 Conclusion 

Fast and efficient algorithms for solving Structured Total Least Squares prob- 
lems [14,15] are based on a particular implementation of the GSA requiring two 
hyperbolic transformations at each iteration. 

In this paper the stability of such implementation is discussed. It is proved 
that if the hyperbolic transformations are performed in factored form, the con- 
sidered implementation is as stable as the implementation studied in [15] that 
requires only one hyperbolic transformation at each iteration. 
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Abstract. The three level operator finite difference schemes on non- 
uniform on time grids in Hilbert spaces of hnite dimension are consid- 
ered. A priori estimates for uniform stability on the initial conditions are 
received under natural assumptions on operators and non uniform time 
grids. The results obtained here are applied to study stability of the three 
levels weighted schemes of second order approximation O + Tn') for 
some hyperbolic and parabolic equations of the second order. It is es- 
sential to note that the schemes of raised order of approximation are 
constructed here on standard stencils which are used in hnite difference 
approximation techniques. 



1 Introduction 

Contemporaneous computational methods of mathematical physics alongside 
with the traditional requirements, such as stability and conservativity, have to 
satisfy also the adaptivity requirement. Application of adaptive grids first of all 
means that one have to use non uniform grid instead of uniform one which is 
adapted to behaviour of the singularities of the solution. It is known , that at 
use of non-uniform grids the order of local approximation becomes lower. One 
can increase the order of approximation by simple use of more extended stencils 
or by considering more restricted classes of solutions of differential problem. Let 
us to call attention to an another opportunity to increase accuracy expanding 
the approximation of initial differential equations from the points of a computa- 
tional grid to some intermediate points of computational domain [1]. At present 
computational methods on non-uniform spatial grids have been widely studied 
for wide class of equations of mathematical physics with preservation of the 
second order local approximation with respect to the spatial variable [1] — [6]. 
Nevertheless the theoretical aspects of the three-level schemes on non-uniform 
time grids are less investigated [7,8]. 

This communication is devoted to investigation of the three level operator 
finite difference schemes on non-uniform on time grids with the operators acting 
in Hilbert spaces of finite dimension. The stability on initial conditions is proved 
and also a priori estimations in grid energy norms are obtained. Examples of the 
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three level finite difference schemes of the second order of local approximations 
on time and spatial variables for parabolic and hyperbolic equation of the second 
order are presented. Especially we emphasize, that increase of the order of local 
approximation on non-uniform grids is achieved without increases of a standard 
stencil of the finite difference scheme. 



2 Three Level Operator Finite Difference Schemes 



Let us consider real Hilbert space H of finite dimension of real valued functions 
defined on non-uniform time grid 

Wr = {tn = tn-1 + Tn, n S 1,2, . . . ,iVo, to = 0, = T} = . 

We designate as D{t), B{t), A : H ^ H linear operators in H. Let us 
consider a Cauchy problem for homogeneous finite difference operator equation 

Dyti + Byt + Ay = 0, yo = uq, yi = ui , (1) 

where y = yn = y{tn) G H is the unknown function, and uq, ui £ H are given 
functions. Here and in the following index-economic notations are used: 

yii= iVt-Vt) /r*, yt = {yn+i-yu) /Tn+1, yi= {yn- yu-i) / tu, 

~ ~ * n c / I ^ Vn+l — yn—1 

y=Vn+l, y=yn-l, T = 0,5 {Tn+1 + Tn) , 2/o = ^ . 

^ '^n ~T ^n +1 

Let us designate as where = Rk > 0 a, space with inner product 

{y,v)nkj Vt'v £ H, and with semi-norms = {Rky,y)- Let us suppose that 

the operators entering in the scheme (1) satisfy the following conditions : 

D{t) = D*{t) > 0, B{t) >0, t£ujn, A = A* >0 , (2) 

D{t + T)<D{t), B{tn)>0,5Tn+lA , (3) 

Tn+2 Tn+1 

where A{t) = H is a constant operator. Concerning conditions (3) we shall make 
some observations. 



Remark 1. Usually in the theory of stability of the three level finite difference 
schemes [1] with the variable operator D{t) its Lipschitz-continuity on variable 
t is required. However, if one studies the stability , for example, of the weighted 
three level scheme [2] 



Vii + =0, yo = uo, yi=ui , 

y(<Ti.o-2) ^ _ 0 - 2 ) y -I- a2V , 



(4) 

(5) 



than this requirement implies undesirable requirement of quasi-uniformity of the 
time grid 



Tn\ ^ 



n= l,2,...,A^o-l • 



( 6 ) 
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Remark 2. The second restriction from (3) r„+i/r „+2 < TuItu+i is not rigid. 
Really, let the steps of a grid are chosen satisfying the geometrical progression 
law Tn+i = qTn- Then the given inequality is valid for any q = const > 0. 

Before we formulate results, we shall give definition of stability of the finite 
difference scheme ( 1) in case of the linear operators D, B, A. 

Definition 1. The operator finite difference scheme (1) is called unconditionally 
stable on initial conditions if there exist positive constants Mi > 0 , M 2 > 0 , 
independent of Tn and uq € H ,ui £ H , such that for all sufficiently small t* < 
To, n = 1, 2, . . . , Nq, the solution of the Cauchy problem (1) satisfies the estimate 

l|yt.nllL„ + < Mi\\y,^i\\l^^+M 2 \\yi\\%, . (7) 

If the inequality (7) is valid for every Tn, then the scheme is called absolutely 
stable and when M\ = 1, M 2 = 1 — uniformly stable. 

Let us prove the following affirmation. 

Theorem 1. Let us suppose that the conditions (2), (3) are valid. Then the 
finite difference scheme (1) is uniformly stable with respect to initial conditions 
and the following estimate is valid 

l|yt.n+llli)„+i + Il2/ra+l|||j„+i < + llyilllji 1 (8) 

where R„ = 0,5(1 + Tnlxn+ifA. 

Proof. Considering inner product of both parts of the equation (1) with 2r*yt 
and using the first condition from (4), one have 

{Dyti, yt) = 2t* {Dy^^, 0,5 (y* + yi) + = 

= WvtWl - WViWl + r*^Vu\\l > llyqn+illL+. - liRnllL > 

2T*{Byt,yt) = 2T*n\\yt,n\\l^ . 

If the second condition from (3) is satisfied then one has inequality > 

t*_|_]^/t„+ 2 . Therefore using the last estimate one obtain 

2r* {Ay,yt) = (||y„+i||^- ||y„||i) - T„+iT*||yt,„||^ > 

> l|yn+l|||„+i - ||yn|||„ - ‘^T*\\yt,n\\o,5r^+iA ■ 

Summing these estimates and using the third condition from (3), one has the 
following relation 

Il2/t,n+llli)„+i + ll2/ra+l||/J„+i < Il2/t.nlli)„ + , 



which is valid for every n = 1 ,...,Nq — 1. This immediately implicates the 
desired estimate (8). □ 
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Example 1. Let us consider weighted three level operator finite difference scheme 
(4). Using the identities 

+ (CT 1 T „+1 - CT 2 T„) yt + (T 2 TnT*y^^ (9) 

this scheme can be reduced to its canonical form (1) with 

Dn = E + TnT*<72A, = (cTlT„+i - CT2r„) A. 

One can note that the conditions of the Theorem 1 

Dn+i - D„ = (72 - T„r*) A < 0, 

Bn - 0,5Tn+lA = (t„+i(cTi - 0,5) - T„CT2) A > 0 

are satisfied if 

* ^ \ ^ I 

0'2Tn+lTn+i < CT2TnTn, CT 1 > - CT2 . 

'^n+1 

On the harmonic grid r„+i = qxn the first of above inequalities for (T 2 > 0 is 
satisfied on condensing grid with q < 1, and for CT 2 < 0 this inequality is satisfied 
on dilating grid. If CT 2 = 0, cti = tr, then the scheme (4) could be transformed to 
the following form (with constant operator = E) 

Vtt + Ay^'^'^ =0, yo = mo, 2/i = wi ■ (10) 

Here y^”’^ = <Ty„+i + (1 — cr)y„. As the Theorem 1 affirms, its solution satisfies 
the a priori estimate 

Il2/t,n|p + Il2/ra|||j„ < IlyqilP + llyilllji, n = 1, 2, . . . , TVo , (11) 

(here still i?„ = 0,5(1 + t„/t„+i)A, and A* = A > 0 — is constant operator) if 
the conditions 

^ 1 Tn Tn Tn+1 r,^ 

Cr>-H CT2, > • (12) 

^ ”^n+l '^n+2 

are satisfied. 

Example 2. The seeond order of loeal approximation seheme on non-uniform 
time grid. In rectangle Qy = 17 x [0, T], = {x : 0 < x < 1}, 0 < t < T let us 

consider the first initial boundary value problem for one dimensional parabolic 
equation 

u{0,t) = u{l,t) = 0, t > 0, u{x,0) = uq{x), 0<x<l , (14) 

where 0 < ci < k(x) < C 2 , ci, C 2 = const. On uniform in space and time variable 
grid 

U = Ivh X uJr, LOh = {xi = ih, i = 0, 1, . . . , N, hN = 1} , 
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let us approximate the differential problem (13), (14) by the following finite 
difference sheme 



yt + 0,5r+?/ty = , r+ = r„+i , (15) 

yo = i)N = 0, t/(x, 0) = Mo(a;), x GZJh ■ (16) 

Here 



a = at = 0,5 (ki-i + ki) , ki = k{xi), y = = y {xi,tn) , 

(«2/s)x = (ai+i2/$.z+i - a^ys,^) /h, ys:,i = {y? - y”-i) /h ■ 

It is easy to verify that at the node (xi,tn+i) the three level scheme (13), (16) 
approximates differential problem with the second order, that is 

= -ut,i - + ( om ”+ 1 )^ . = 0{h^ + . 



The scheme (13) is one generalization of well known asymptotically stable scheme 
[3, p.309] at non-uniform time grid 

^yt - ^Vt = (ayx),, ■ 

The scheme (13), (16) could be transformed to operator finite difference scheme 
(1), by putting y = y^= {y^,y^, . . ■ ,y)(r_i), {Ay)^ = ~{ays)x,i, i=l,...,N - 
1, yo = yN = 0, Dn = 0,5r„+iif, Bn = E + Xn+iA. In this example the space 
H = Hh consists in grid functions which are defined on the grid Uh and which 
are equal to zero on the boundary. Scalar product and norm are defined by 
expressions: 

N-l 

{y,v) = hy,Vi, ||y|| = V(y,y) ■ 

i=l 

The properties of the operator A are well investigated [3]. In particular. A* = 
A > SE, S = 8ci/P. Let’s check up the conditions of Theorem 1. It is obvious, 
that Dn < Dn-i pri r„+i < t„. Bn - 0,5r„+i = E + 0,5r„+iH > 0 for every 
Tn+i > 0. Hence, the scheme (13), (16) is uniformly stable on initial data if 



"^n+l ^ '^n 

— 5 

”^n+2 '^n-\-l 



^ '^n • 



(17) 



Let us note, that the conditions (17) are satisfied on harmonic grid Tn+i = qxn 
with arbitrary 0 < g < 1. 



3 Finite Difference Schemes of Raised Order of 

Approximation on Non-uniform on Time and Space 
Grids 

Suppose that in the domain Qrp it is required to find continuous function u{x, t), 
satisfying following initial boundary value problem 

d^u _ d^u 
dt^ dx'^ ’ 



0<x</, 0 < t <T 
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u{0,t) = u{l,t) = 0, u{x,0) = uo{x), —{x,0) = uo{x) . 

Let us consider next non-uniform spatial - time grid u; = uJh xtUr- 

LOh = {xi = Xi-i + hi, i=l,2,...,N, xq = 0,xn = 1} = 



= [J {xo = 0, XN = 1} , 

UJj- — {tn — ^n—l Tn, Tl — 1,2,..., Nq, to — ^Nq — — 

= Wi- [J {to = 0, Ino = T} . 

We approximate on this grid the differential problem by the finite difference one 



Vti + 



h-i- h (cri,cr2) 

Q Vtix Vxx 



(18) 



2/0+' = 2/)^+' = 0, 2/° = u0, <, = uo., / = 0,l,...,iV . (19) 

Let us note, that uq{x), x € uJh, is chosen in such a way that the error of 
approximation of the second initial condition has order O(r^): 



uo(a;) = uo(x) -I- 0,5riUo(a;) . 



Here usual designations are used [1]: 

h+ = h^+i, h = hi, y = y^ = y (xi,tn) , ys = {y^ ~ Vi-i) /K , 



Vxx = ivx - Vx) /h, y,c = {y+ - y) /h+, y+ = y{xi±i,tn) , h = 0,5{h+ + h) . 
Let us show, that in supplementary node (xi,tn)' 



2 (^2—1 

- _ 1 
tn — 2 \^n—l 



+ Xi + Xi+i) — Xi + 



+ tn + tn+l) — tn + 



hi-\-i hi 

3 

'^n+l Xn 

3 



( 20 ) 



with 



'^n 

^2'^n — Z 



(21) 



the finite difference scheme (18), (19) approximates the differential problem with 
the second order 0{h^ + r*^). For this purpose we shall rewrite the residual V' 
as 






^XX 



'^ti + 




tpl+i’2 , 



lb, = - — 

VI Wax Q^2 ’ 



tp2 = 



d'^u 

dt^ 



h+ — h 



Here u = u{x,t), x = x + (/i+ — ti)/3, t = t -I- (r+ — r)/3. Let us note that an 
advantage of the scheme (18)(19), (21) is that fact, that for uniform grids ujh, Wr 
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(t.e. T+ = T, h+ = h) this scheme reduces to the classical scheme of the order 
0{h? + T^) on a uniform grid [3]. We proceed with the analysis of ipi, 4’2- Using 
the identity (9) and the weight conditions (21), , we conclude, that for any grid 
function v{xi,tn) the following relation is valid 



^ ^ _|_ a 2 TT*Vff = V (x,t) + O (t*^) . (22) 

O 

Hence, 

d'^u 

'ipi=ip3 + 0 (r*^) , = Usx {xi,t) - — . (23) 

Using the Taylor series decomposition it is easy to show, that 



'03 — '^xx 



Q^2 UU - ^^2 (2^* Un) + 3 ^^3 (X. , tn) 



(24) 

By virtue of the next relations (which one can easy obtain with the help of the 
Taylor’s formula) 



v{Xi,tn) + 



— h 



’x,i — 'C {Xij -\- O {jii ) 



^-ViiiXi,tn)=0{T*^) , 

one conclude, that the grid function tjj 2 is an infinitesimal of the second order , 
that is. 

V’2 = 0(?1-+t*2) _ (25) 

On the basis of the formulas (22) — (25) one conclude that finite difference 
scheme (18), (19), (21) approximates the initial boundary value problem for the 
wave equation on the standard 9-points stencil (see Fig. 1) with the second 
order (for sufficiently smooth function u{x,t))\ 



For further investigation of the finite difference scheme (18), (19) some known 
formulas and identities are required: 



y + y , y + y 't+-t 

V = —A — ^ ^ -A — yi 



TT+ 



-yu 



yt + yt , T+-T 

+ — i— ■ 



(26) 

(27) 



4 



( 28 ) 
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fc-l, ?K+i) 



Tn 










h, 


. hti , 



Fig. 1. 



tn+i) 



(p^i+l, tn) 



C^;+l, tn-]) 



Let US introduce scalar products and norms of functions defined over a non- 
uniform spatial grid: 



7V-1 N 

(y,v)* = WyW^ = (y,y)^, {y,v] ='^h^y^V^, \\y]\ = {y,y], 

i=l i=l 

Lemma 1 (First finite difference Green’s formula ) . For any grid function 
y{x), which is defined on non-uniform grid uJh and vanish at x = 0 and at x = I 
the next formula is valid 

{y,Vxx)^ = {yx,Vx] ■ (29) 

One has the following theorem. 

Theorem 2. Let us suppose that 



\\n/h\\c<c, ii-ilc 



max I • I, T„+i-T„ > 

xGlih 




^llc ’ 



and 



2T7i-|- 1 -t“ Tn 
6 (r„+i -I- Tn) ’ 



fn+1 ~t“ ‘2^Tn 
6 (r„+i -h Tn) 



n = 1, . . . ,iVo-l , 

(30) 

(31) 



Then the finite difference scheme (18), (19) of the second order of local approx- 
imation 0{h( -h T*^) is uniformly stable and one has the estimation 



\\yl 



dxx O"! 




S/£=>(0) 



( 32 ) 



where v^°’^)tn) = 0, 5(u"+^ -I- u”), u” = (u"'+^ 



u"')/t„+i. 
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Proof. Let us note that cr", CT 2 defined with the formula (31) and satisfy 
the relation (21), which is necessary for increase the approximation order on 
a non-uniform grid. Let us multiply now the finite difference equation (18) by 
—2T*Tiiyo and sum at inner nodes of non-uniform space grid cD/j. After appli- 

txx^i 

cation the formula (29) we obtain the energy identity: 



2t* 



[Vtix^y: 



- 2t* 



h+ — h 



-y-tix^vi 



) +2t*( 

V* ^ 



yxx ’ y°. 



= 0 



(33) 



Applying identity (28), one finds the equality 



2t* {yttx^ = I \ytx]\^ - \ \ytx T + (^-n+i - tu) \ \y^f^] \ 



(34) 



Using now formulas (26) (, 27) and condition of the second order approximations 
(21),we obtain for following representation 



,(0'1.<T2) _ 1 
2 






T+-T 



(35) 



In deriving the formula (35) we used the property 



I 

CFt + <J-2 — - 



(36) 



Let us note, that if the variable weight multipliers do not satisfy to equality 
(36), then it is possible prove stability of the finite difference scheme (18), (19) 
only on quasi-uniform in time grid (6). In this case the estimation of stability 
will not carry uniform character, that is the constant Mi = exp cqT, appearing 
in definition 1 (see.( 7)) will be much more than unit. Taking into account, 
that t/o = (y(°’5) — y('^’®))/r* and using (35) for third term in (34) one can find 
the following equality: 



2t* 



^xx ’ y°^^.) 



( 0 , 5 ) 

^xx 



(tn) 



( 0 , 5 ) 

V-~ 

^XX 



{tn-l) 



t '^n+l ‘^n 



(37) 

Using the algebraic inequality 2ab > —a^ — b^, we shall estimate the last remain- 
ing scalar product in(33): 



-2r* 



— h 
3 



'^n II i|2 

^ — mixll - 



{h+ - h)'^ hi 2 

7 1 y°_ 

Tn-\-l Tji rii tx 



(38) 



Substituting obtained estimations (34) (,37) (,38) in energy identity (33), we 
come to recurrent relation 



Wvl. 






( 0 . 5 ) 

Vxx 



(tni ) 



rrom which immediately follows the estimation (32). □ 

This work was supported by Byelorussian Republican Fund of Fundamental 
Researches (project F99R-153). 
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Abstract. We propose a new quadratically convergent algorithm, hav- 
ing a low computational cost per step and good numerical stability prop- 
erties, that allows the computation of the maximal solutions of the ma- 
trix equations A -f C*A-^C = Q, X - C*X~^C = Q, X + C*{R + 

B* XB)~^C = Q. The algorithm is based on the cyclic reduction method. 

1 Introduction 

We consider the problem of the computation of the maximal Hermitian solutions 
of the matrix equations 



X + C*X~^C = Q, (1) 

X-C*X~^C = Q, (2) 

where Q is an m x m Hermitian positive definite matrix, C is an m x m matrix, 
and C* denotes the conjugate transposed of C. 

For Hermitian matrices X, Y , we write X >Y (A > F) if A — F is positive 
definite (semidefinite); we say that A+ is the maximal Hermitian solution of an 
equation if A+ > A for any Hermitian solution A. 

Equations (1) and (2) arise in a wide variety of research areas, that include 
control theory, ladder networks, dynamic programming, stochastic filtering and 
statistics (see [1,18,8] for a list of references). 

Equation (1) is a special discrete algebraic Riccati equation 

- A -b A*XA + Q- (C + B*XA)*{R + B*XB)~\C + B*XA) = 0, (3) 

where A, B, C, Q, R are m x m matrices, Q and R are Hermitian, obtained by 
setting A = R = 0 and B = I. 

The available numerical methods for the solution of (1) and (2) are based on 
fixed point iterations, or on applications of Newton’s algorithm [10,8,18,19,6,7,1]. 
In particular, in [10] the authors adapt the algorithm proposed in [9] for the com- 
putation of the maximal solution of (3), based on Newton’s iteration. Newton’s 
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iteration for solving matrix equations generates a sequence that quadratically 
converges to the seeked solution, but generally has a large computational cost 
per iteration; fixed point iterations have a lower computational cost at each step, 
but have linear convergence. 

Here we derive a new algorithm for computing the maximal solution of (1) 
and (2), that has a double exponential convergence, like Newton’s method, and 
a low computational cost per step, like fixed point iterations. 

The idea consists in rewriting equations (1) and (2) in terms of infinite block 
tridiagonal block Toeplitz systems and in applying a modification of the cyclic 
reduction algorithm to the above systems [12]. The computational cost per iter- 
ation is 0{m?) arithmetic operations (ops), like fixed point iterations. Moreover, 
the algorithm shows good numerical stability properties. 

Finally we analyze the problem of the solution of the Riccati equation (3) in 
the particular case where H = 0, i.e., 

-X + Q-C*{R+B*XB)-^C = Q. (4) 

We show, that, if B is nonsingular, the problem of the computation of the max- 
imal solution of (4) can be reduced to the problem of the computation of the 
maximal solution of a matrix equation of the form (1). Thus, the efficient solu- 
tion of (1) allows the efficient solution of the equation (3) in the case A = Q and 
det i? yf 0. 

A possible extension of these results for the solution of a general Riccati 
equation (3) is under study. 

The paper is organized as follows. In Section 2 we recall conditions for the 
existence of the maximal solution of (1), (2) and (3), and some spectral prop- 
erties of the solution, that will be used in the subsequent sections to show the 
convergence of our algorithm. In Sections 3 and 4 we present the algorithm for 
the solution of (1) and (2), respectively. In Section 5 we analyze the problem of 
the solution of (4). 

2 Existence and Properties of the Maximal Solution 

In this section we recall conditions about the existence of the maximal solu- 
tion X+ of (1), (2) and (3), and some spectral properties of matrices related 
to A+, that will be used in the subsequent sections to show the convergence of 
our algorithm. 

Necessary and sufficient conditions for the existence of a positive definite 
solution of (1) are provided in [7]. More specifically, let us introduce the rational 
matrix function 

V'(A) = AC + Q + A-iC*, (5) 

defined on the unit circle S of the complex plane, that is Hermitian for any 
X G S. This function is said regular if there exists at least a X G S such that 
det'!/'(A) yf 0. 

The following fundamental results hold [7,18]; 
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Theorem 1. Equation (1) has a positive definite solution X if and only if 
is regular and 'f{X) > 0 for all XG S. In that case (1) has a largest and unique 
solution X_|_ such that X + XC is nonsingular for |A| < 1, and p{Xf^C) < 1, 
where the symbol p(-) denotes the spectral radius. 

In [10,7] the authors characterize the eigenvalues of Xf^C and show that 
Xf^C has spectral radius strictly less than one if and only if if{X) is positive 
definite on the unit circle: 

Theorem 2. It holds p{Xf^C) < 1 if and only if ij){X) > 0 for all A S 5. 

Concerning equation (2), in [8] the authors prove the following results: 

Theorem 3. The set of solutions of (2) is non-empty, and admits a maximal 
element X+, and X+ is the unique positive definite solution. Moreover, the spec- 
tral radius of Xf^C is strictly less than one. 

Consider now the discrete algebraic Riccati equation (3). Let us denote by 
Tl{X) the linear application defined by the left-hand size of (3). For A € 

B € (jkxm ^ {A,B) is said d-stabilizable if there exists a K G 

such that A — KB is d-stable, i.e., all its eigenvalues are in the open unit disk. 
The following result holds [9,11]: 

Theorem 4. Let (A,B) a be d-stabilizable pair and suppose that there is a Her- 
mitian solution X of the inequality R{X) > 0 for which R -|- B* XB > 0. Then 
there exists a maximal Hermitian solution of (3). Moreover, R-\- B* XB > 0 
and all the eigenvalues of A — B{R-\- B*Xj,.B)~^{C -\- B*Xj^ A) lie in S. 

3 Computation of the Maximal Solution of 

X + C*X~^C = Q 

In this section we describe the new algorithm, based on cyclic reduction, that has 
a computational cost roughly larger of a factor two with respect to fixed point 
iterations [8,18,19,6,7,1], and a double exponential convergence, like Newton’s 
method [10]. For more details on the new algorithm we refer the reader to [12]. 

Throughout this section we suppose tf{X) regular and positive semidefinite 
for any X G S, where ip{X) is given in (5), so that the conditions of Theorem 1 
are satisfied. 

Let X be a solution of (1). Then, by multiplying on the right both sides of 
(1) by X~^, we find that 



Thus, the matrix G = X 



I + QX~^ - C*X-^CX~^ = 0. 

~^C solves the quadratic matrix equation 
-C + QG-C*G^ = Q. 



(6) 

( 7 ) 
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In particular, if X+ is the maximal solution of (1), since /o(X^^C') < 1, then 
the matrix equation (7) has a solution G+ = with spectral radius at 

most 1. 

The nice relation between the matrix equation (1) and the quadratic matrix 
equation (7) together with the spectral properties of the solutions of the latter 
equations, allow us to derive a fast algorithm for the computation of X+. 

The matrix can be efficiently computed by rewriting the matrix equa- 
tions (6), (7), in terms of linear systems, and by applying the cyclic reduction 
algorithm, according to the ideas developed in [2,3,4]. In fact, we observe that 
the following system of equations is verified: 



Q -C* 0 

-C Q -C* 

-C Q ■ 

0 ■■ 



I ' 




' — 1 


G+ 




0 


Gl 


+ 1 
II 


• O 









(8) 



By following the strategy successfully devised in [2,3,4] for solving nonlinear ma- 
trix equations arising in Markov chains, we apply the cyclic reduction algorithm 
to the above systems. This consists in performing an even-odd permutation of 
the block rows and columns, followed by one step of Gaussian elimination, thus 
generating the sequence of systems: 





0 ■ 




I 






Qn 


-c: 




Gf 




0 


-Cn 


Qn 




G^2" 


+ 1 
II 


0 












- ■ _ 



(9) 



The block entries of each system are defined by the following recursions: 



Go — G, Qo — Xq — Q, 

Cn+l = CnQn^Cn, 

Qn+1 = Qn- CnQ-^C* - ClQ-^Cn, 
X „+1 = Xn - C*Q-^Cn, n>0. 



( 10 ) 



Observe that the matrices Q„, X„ are Hermitian, and thus the matrices in (9) 
are Hermitian. 

The spectral theory of Hermitian block Toeplitz matrices [5,15,16,14,13,17] 
guarantees the positive definitiveness, and thus the nonsingularity, of the blocks 
Qn- Indeed, let us define the function / : (— tt, tt) ^ ff™, f(0) = —e '^^C+Q — 
e~ ^ where i is the imaginary unit, and iT™ is the set of m x m Hermitian 
matrices, and denote by 



Ml — illf M2 — SUp Amax(/*(^)): 

0G(— TT.TI-) 6»e(— TT.TI-) 



where Amin(/(0)) (Amax(/(^*))) is the minimum (maximum) eigenvalue of f{0). 
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Since ipW ^ 0 for any A S S', it holds f{9) > 0 and > 0. Moreover, since 
tp{X) is regular, the set where f{9) is positive definite is given by (— tt, tt) except 
at most a finite number of points. 

From these properties, the following result, that guarantees the applicability 
of the cyclic reduction algorithm and the boundness in norm of and Xn, can 
be proved [12]: 

Theorem 5. The matrices Qn, Xn, n > 0, are positive definite, and their 
eigenvalues belong to the interval [/ri,/r 2 j. Moreover, it holds 0 < Qn+i S Qn, 
0 < Xn+i < Xn, for n > 0. 

If tp{X) >0 for any A G S then > 0, thus also Q~^ is bounded in norm, 
and the condition number of Qn is bounded. In this case the sequence {Xn}n 
quadratically converges to X+ (see [12]): 

Theorem 6. If ip{X) > 0 for any A € S, then for any operator norm || • || 
and for any a, p{Xf^C) < a < 1, it holds [[/ — = O ) and 

||C„|| = 0(a2”). 

These nice properties allow us to design a quadratically convergent algorithm 
for the computation of the maximal solution Each step consists in gener- 
ating the blocks defined in formula (10); hence it requires the solution of two 
Hermitian linear systems, i.e., the computation of and C*Qf^, where 

the matrices Qn have bounded condition number, and the computation of three 
matrix products. 

If the hypothesis of Theorem 6 are not satisfied, then p{Xf^C) = 1. In this 
case Xn and Qn are still Hermitian positive definite and bounded in norm. In 
general, the sequence {Qn^}n may be not bounded. However, if the sequence 
{Cn}n converges to zero, and the sequence {{Xf^C)'^ }„ is bounded, then the 
sequence {Xn}n still converges to 

4 Computation of the Maximal Solution of 

X - C*X~^C = Q 

For the computation of the maximal solution X+ oi X — A*X~^A = Q we can 
apply a technique similar to the one used in the previous section. Specifically we 
observe that 

-I + QXf^ + C*Xf^CXf^ = 0. 

Thus, by setting G+ = Xf^C, the following linear system is verified: 



o 

* 

o* 




1 ' 






-C Q C* 




G+ 




0 


-C Q 




Gl 


+ 1 
II 


0 


0 











( 11 ) 
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The infinite matrices in the above systems are block Toeplitz, but are not Her- 
mitian. However, if we apply one step of cyclic reduction, we obtain the following 
system: 



-Cl* 


0 ■ 




■ I ' 






Qi 


-c* 




G\ 




0 


-Cl 


Qi 




g\ 


+ 1 
II 


0 












. ’ _ 



(12) 



where 

Cl = CQ-^C, 

Q, = Q + CQ-^C* + C*Q-^C, (13) 

= Q + C*Q-^C. 



Thus, after one step of cyclic reduction, we obtain a Hermitian system, with 
the structure of (9), where the diagonal blocks are positive definite matrices. 
Moreover, observe that the function 



r/>i(A) = ACi + Qi + A-^Ci*, 

is such that ^i(A) > 0 for any A G S', since ipiW = Q+(C —XC*)Q~^{C* — XC). 
If we apply cyclic reduction to system (12), we generate the sequence (9), for 
n > 1, where 



(14) 



and Cl, Qi and Xi are defined in (13). 

Without any assumption, the following convergence result can be proved [12]: 



Theorem 7. For the matrices X„, Q„, n > 1, defined in (If), it holds: 

1 . 0 < Xn-\-l ^ Xji, 0 < Qn+l — Qn, n = 1 , 2 ,.. 

2. Qn and are bounded in norm. 

Moreover, for any operator norm jj - jj and for any a, p{Xf^C) < a <1, it holds 



I - XnXf^W = O , ||C„|| = c(a2") . 



From the above theorem, the quadratic convergence is always guaranteed, 
and the condition number of Qn is always bounded. The resulting algorithm 
has the same nice features, in terms of computational cost and convergence 
properties, of the algorithm for the solution of (1). 



5 Computation of the Maximal Solution of 
-X + Q - C*{R + B*XB)-^C = 0 

Let us assume that the hypothesis of Theorem 4 are satisfied, and that the 
matrix B is nonsingular. Hence X is a solution of (4) if and only if X solves 

-B*XB + B*QB - B*C*{R + B*XB)~^CB = 0. 
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Thus, if we define Y = R+ B*XB, then the matrix V solves the matrix equation 

-Y + R + B*QB-B*C*Y-^CB = 0. (15) 

The latter equation is of the form (1). Since for Theorem 4 = R+B* Xj^B > 0, 

equation (15) has a positive definite solution. Thus, from Theorem 1 it follows 
that the function ip{\) of (5), associated with equation (15) is regular and ^/’(A) > 
0 for all X G S. Hence, if i? + B*QB is positive definite, we can apply the 
algorithm described in Section 3 for the computation of the maximal solution Y^ 
of (15), and we can recover by solving the linear equation 1+ = R+ B* X+B. 
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Abstract. We test some envelope-following methods for first-order dif- 
ferential systems against their counterparts for second-order systems. 
While this latter kind of methods are more efficient than the usual 
second-order designed solvers, they only are more efficient than the en- 
velope-following methods for first order systems in problems without 
dissipation of orbital dinamics. 



1 Introduction 

The numerical solution of highly oscillatory systems is today one of the most 
challenging problems in the numerical solution of ordinary differential equations. 
Hamiltonian systems of the classical mechanics close enough to integrables share 
this type of solution. A complete review of the methods and problems that arise 
when dealing with these ‘high oscillations’ can be seen in [10]. 

The basic fact, roughly speaking, is that some of the components of the 
solution "the angular variable in orbital problems" is a high-frecuency oscillation 
while the others are not , forcing the high-one to advance the solution very slowly. 
Furthermore, in these latter problems, the solutions are quasi-periodics and the 
variations of the rest of the components are small, even when the period of the 
angular variable is not small. 

Between them, envelope-following methods for first-order ODE systems, orig- 
inally known as multi-revolution or generalizazed RK and multistep methods, 
have shown to be very efficient in long-term orbit calculations, as well as some 
generalizations of linear multistep of the Stormer-Cowell type [3, 9, 7,8]. 

In previous papers we have seen that methods for second-order systems of 
envelope-following type are faster than the usual second-order solvers but no 
comparison between both types of methods was done. This paper is aimed at 
comparing their performances. 

Description of the Methods 

Let’s suppose that the IPV 

j y'{t) = te[to,L], 

\y{to) = yo G 
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has a highly-oscillatory quasi-periodic type solution. 

In the envelope-following methods for first order systems, a discretization of 
(1) is made by substituting the difference equation for the I VP 

Vn+l Un — 9n^ ^ C IN, (2) 

where yn stands for the approximation to the solution y(tn) at each of the nodes 
of a grid {tj = tg + j T \ j = 0^1 ...}, T is an approximation to the quasi-period 

of the solution and gn to the definite integral g{tm y{tn)) = / /(t, 2 /(t)) dr 

where y(r) is the solution to the initial differential equation with {tn,y{tn)) as 
an initial condition. 

A numerical solution to (2) is obtained in a uniform grid of stepwise M, 
positive integer, {tj = to + jH \ j >0, H = PIT} by using a linear relationship 
which can be written in the form 



p{E,PI)yu = Ma{E,M)gu, n > 0, 



(3) 



k k 

with ^ , Pj{l/PI)(^ , the first and second 

3=0 3=0 

characteristic polinomials of the method and E = E^ , where E stands for the 
shift operator of length T. 

For the special IVP 



I y”{t) = tG[to,L], 

1 y{to) = 2/0 € y'{to) = 2 /o € 



the envelope-following methods that we proposed in [8] follow the same path, 
i.e. , the discretization of (4) is made by substituting the difference equation for 
the IVP 

y „+2 - 2y„+i -I- 2 /„ = y„, n G IN, (5) 



whose solution is found in the points of a grid of stepsize M as the previous one 
through the relation 

p{E, M)y„ = M‘^a{E, M)gn- 



if we introduce the characteristic polinomials, as before, and where y„ stands 
now for an approximation to the definite integral y(f„,y(f„)) that now admits 
the integral expression 



f (1 — s) [/ {tn -b (1 -b s) T, y {tn ~b (1 + s) T)) -b 
JQ 

f {tn + {1- s)T,y {tn -b (1 - s) T)) ] ds , 



where y(t„-b(l±s)T), s € (0, 1), is the solution to the initial differential equation 
with {tn,y{tn)) as an initial condition. 
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Finally, given the I VP 




(6) 



the following discretization can be done 





consider a couple of multi-revolution algorithms or simply apply a formula for 



Consequently, we will generally have one or at most two sequences of differ- 
ence equations of the form 



where pi{E,M),ai{E,M), i = 1,2 stands for the characteristic polinomials of 
an envelope-following method for a first-order or a second-order ODE system, 
respectively. 

2 Numerical Properties 

The accuracy of the numerical solution provided by those methods is measured 
by the asymtotic expansion in powers oi h = Me oi the operator 



where e is a measure of the variation of the true solution along a quasi-period, 
which can generally be done through the value of the ‘period’, T, in a strictly 
‘highly oscillatory’ problem or numerically. 

In [8] the following result is proved 




Pi{E,M)yn = M"ai{E,M)gn, n > 0, i = l,2 



(7) 




Theorem 1. The envelope-following method (7) is of order p if and only if one 
of the following conditions is satisfied: 

i) The linear forms 




satisfy 



CW=0, 0<r<p + i-l, i = l,2. 



C^;l ^0, * = 1 or 2. 
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ii) p*(e^) - l)V,(e^) = {h — > 0), i=l,2. 




have in (^ = 1 a cero of order p. 



where 



k 



k 



Mr{pi) = ^ afj 






are the momenta of the polinomials pi and at . 

The stability of the algorithms can be characterized in terms of the roots 
condition of Dahlquist, i. e., the roots of Pi{Cf) are inside the unit circle and the 
roots of module one have at most an order of multiplicity i. 

Equally, it can be seen that the methods are convergent if their coefficients 
are related through the relation stated in the previous theorem and the stability 
condition. 

Taking pi{C,) = (( — the coefficients of the Adams-Bashforth and 

Adams-Moulton-like methods are obtained when i = 1 and of the Stormer and 
Cowell- like when i = 2. 

3 Implementation of the Methods 

The methods have been implemented in the predictor-corrector form P(EC)"^ E, 
where E means an evaluation of both /„ and pn at each step. This latter requires 
an integration over a period in the algorithm for first order and along two periods 
in the second order algorithms with the approximation provided by the explicit 
method used as a predictor and follows the same pattern as described in [2,6]. 

The inner integration is performed by the codes DOP853 [5] and DGEAR of 
the IMSL library, in the first-order implemetation and by the Nystrom of order 
10 due to Hairer [4] and DGEAR in the second order. 

Predictors are the fc = 11 steps of the Adams-Bashforth and Stormer-like 
families and correctors the fc = 10 steps methods of the Adams-Moulton and 
Cowell- like families, respectively. 

4 Numerical Tests 

We consider as a first test problem the Kepler’s planar problem in cartesian 

coordinates with initial conditions pi = 0 ,P 2 = \ , = 1 — e, g 2 = 0, 

V 1 — e 

which is a standard in orbital dinamic problems. 

The following precision-work diagrams show the performance of the inte- 
grators. The abscisa is the global error in uniform norm at the end point of 
integration and the ordinate is the number of function evaluations in a double 
logarithmic scale. 
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Figure 1 show the performance of the multi-revolution algorithms for first 
order systems, denoted MRABMIO, and the DOP853 solver. We have imple- 
mented with variable stepsizes and with constant stepsizes the inner integration. 
Graphs on the top, and on the bottom rigth corner of the figure correspond to 
eccentricity e = 0.1 and the graph to the bottom left corner to e = 0.5. 

Simbols in figures are o for the MRABMIO method with M = 4, -|- for the 
DOPRI8 and DOP853, o for the MRABMIO method with M = 8, * for the 
MRABMIO method with M = 16, V for the MRABMIO method with M = 32, 
X for the MRABMIO method with M = 64 and > for the MRABMIO method 
with M = 128. 



Short term ( e=0.1 ) Medium term ( e=0.1 ) 







Fig. 1. Performance of multirevolution algorithms for first order systems 



The short term integration is taken for 320 periods and stepsizes of external 
integration of 4, 8 and 16 periods. The medium and long term integration for 
2560 and 20000 periods and stepsizes of 4, 8, 16, 32, 64 and 128. The P{EC)'^E 
mode considered take m = 2. The inner integration is performed by the DOPRI8 
solver with a fixed number of steps of 32, 64, 128, 256 and 512 by period in the 
short and medium term integration and with the variable step solver DOP853 
in the long term case with tolerances of the same order of magnitude of the 
local error in the step-fixed implementation. As a conclusion we can say that the 
multi-revolution algorithm matches the global error propagation of the inner 
integration but with greater efficiency. 

When the eccentricity grows, a greater precision is needed in the inner inte- 
gration in order to achieve the same global error and smaller stepsizes must be 
taken in the outer integration. The same behaviour is shown when a longer time 
of integration is considered. 



A Numerical Comparison between Multi-revolution Algorithms 591 



Figure 2 shows the performance of the algorithms for second-order systems. 
The graph on the left corresponds to the implementation with a couple of multi- 
revolution algorithms, denoted MRABMSCIO, and the graph on the rigth corre- 
sponds to the implementation with only an Stormer-Cowell-like formula, denoted 
MRSCIO. Simbols are as before, joined by a dash-dot line in the implementa- 
tion with a couple of multi-revolution algorithms and with a dashed line in the 
direct implementation, with A for the Nystrom method. In view of this greater 
stability shown by the SC-like formula it was the only one used to carry out the 
following numerical experiments. One posible reason of this behaviour can be 
that exposed in [1], i. e., the eigenvalues of the numerical operator, when a pair 
of difference equations is used, are not directly related to the eigenvalues of the 
initial differential operator. 



Short term efficiency diagram ABM-SC ( e=0.1 
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Fig. 2. Performance of multirevolution algorithms for second order systems 
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-3.69 



-3.7 



V * 0 



5.5 6 6.5 

Function evaluations 



Fig. 3. Performance of multirevolution algorithms for second order systems 



The performance of the envelope-following methods for second-order sys- 
tems reproduces the obtained with their counterparts for first order systems as 
is shown in Figure 3. The inner integrations is performed a fixed stepsize of 
32, 64, 128 and 256 by period by the Nystrom code in the medium term integra- 
tion and with 32 steps in a period in the long-term run. Though two numerical 
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Short term ( e=0.5 ) Medium term ( e=0.5 ) 





Fig. 4. Multirevolution algoritms for first-order systems versus the algorithms 
for second-order systems 



integrations are needed at each step of the algorithms for second-order equations, 
a greater precision is got with fewer function evaluations due to the higher order 
of the inner solver and its smaller number of stages by step. 

Finally in Figure 4 a comparison of the efficiency shown for the envelope- 
following methods for first-order systems versus the algorithms for second-order 
systems is shown. The mode considered now in the P{EC)"^E implementation 
has been m = 1 for both. 
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Fig. 5. Envelope-following methods for first-order systems versus the algorithms 
for second-order systems 



Figure 5 corresponds to the scalar problem of pure resonance of harmonic 
motion described by IVP: 

y” + io^y = WOsinut, y(0) = 1, ?/'(0) = — 0.05 

with uj = 10^. The inner integrations have been carried out with tolerances 
of 10“® and the run last about 2500 periods. The inner solver considered has 
been DGEAR. Simbols are o for the DGEAR code, o for the MRSGIO code 
and * for the MRABMIO code. Even though the number of function evaluations 
in this problem grows faster for second order algorithms, a better precision is 
maintained. 
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5 Conclusions 

Both envelope-following methods for first-order and second-order ODE systems 
match the global error propagation of the inner integration but with greater ef- 
ficiency. In problems without dissipation a greater precision is obtained with the 
algorithms specifically designed for second-order ODE systems due to the possi- 
bility of also using inner solvers specifically designed for second-order systems as 
they reach higher orders with a smaller number of stages. When the same kind 
of solvers is used in the inner integration, the first-order algorithms require a 
smaller number of function evaluations in the run although less precision is also 
obtained but, in any case, a deeper experimentation is needed. 
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Abstract. Plane stagnation point flow is one of a small class of prob- 
lems for which a self-similar solution of the incompressible Navier-Stokes 
equations exists. The self-similar solution and its derivatives can be ex- 
pressed in terms of the solution of a transformed problem comprising 
a partially coupled system of quasilinear ordinary differential equations 
defined on a semi-infinite interval. In this paper a novel iterative numer- 
ical method for the solution of the transformed problem is described and 
used to compute numerical approximations to the self-similar solution 
and derivatives. The numerical method is layer-resolving which means 
that for each of the components, error bounds of the form CpN~‘‘’ can 
be calculated where Cp and p are independent of the Reynolds number, 
showing that these numerical approximations are of controllable accu- 
racy. 



1 Introduction 

Plane stagnation point flow arises when fluid flowing in the direction of the 
negative i/-axis impacts on an infinite flat plate P = {(x, 0) G IR^}. The fluid 
separates into two streams which flow in opposite directions along the plate away 
from the stagnation point at x = y = 0. For large values of the Reynolds number 
Re, the vorticity is limited to a thin parabolic boundary layer on the plate whose 
thickness is independent of x [1]. 

This flow is one of a small class of problems for which an exact solution of 
the steady, incompressible Navier-Stokes equations exists. The exact solution is 
self-similar and therefore can be written in terms of the solution of a transformed 
problem, consisting of a system of ordinary differential equations defined on a 
semi-infinite domain rj £ {0, oo), where rj = y V Re. Standard numerical methods 
approximate the solution of this transformed problem on a finite domain, whose 
length is determined through a numerical process [2]. This means that the sim- 
ilarity solution and its derivatives are available for only a limited range of Re 
and so i?e-uniform bounds cannot be obtained. 
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In this paper, a numerical method that is robust and layer-resolving in the 
sense of [3] is described for the construction of approximate solutions of the 
transformed problem on the entire semi-infinite interval. The ordinary differen- 
tial equations are solved on a finite interval whose length is dependent on the 
numerical width of the boundary layer, and the solution values are extended to 
the infinite domain using carefully chosen extrapolation formulae. The similarity 
solution and derivatives are then formed in terms of the transformed solution for 
values of Re in the range [1, oo). Experimental i?e-uniform error bounds, which 
are calculated in the maximum norm, are presented for the similarity solution 
and derivatives. These show that numerical approximations of any desired accu- 
racy can be computed with this method. It should be noted that the use of the 
maximum norm is essential because other norms, for example the energy norm, 
do not detect parabolic boundary layers [3] . 



2 Problem Formulation 



Incompressible plane stagnation point flow in the domain D = {{x,y) & IR^ : 
y > 0} is governed by the Navier-Stokes equations, which can be written in the 
dimensionless form 



Find u, V and p such that for all {x, y) G D 
du dv 
dx dy 



(Pns) 



du 


du 


dp 


, 1 ( 


' d'^u 


1 


d'^u\ 




dy 


dx 




dx^ 


H" 


dy'^ ) 


dv 

1 


dv 


dp 


, 1 ( 


d'^v 


1 


d^v\ 


Tx^^ 


dy 


dy 




dx'^ 


H" 


dy'^J 


. y = 0 




Ap 


= 0 








= 0 : 




u = 


= t! = 0 








^ (X) : 




u - 


■> X, V 


^ - 


-y, 


Ap 



(a;2 -b y2) . 



( 1 ) 



Here Ap = po — p and po is the pressure at the stagnation point x = y = 0. The 
boundary conditions for u, v and p far above the plate are given by the solution of 
the corresponding irrotational flow problem, and the no-slip condition is satisfied 
on the surface of the plate [4]. 



2.1 Transformed Problem 

The partial differential equations in (Pns) are reduced to two ordinary differen- 
tial equations by performing a separation of variables and introducing a simple 
transformation of variables to rid the resulting equations and boundary condi- 
tions of constants. The self-similar solution of (Pns) has the form 

u = xf{r]) 



(2) 
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where 



rj = y '/R£ 

and /(yy), g{ri) satisfy the transformed problem 



' Find / and g such that for all 77 G (0, 00 ) 
/"' + //" - (/')" + 1 = 0 



(Ft) < 



9 ' = f" + ff 

g = 0 : 



m = f (0) = 5(0) = 0 

fiv) - 1 • 



(3) 

(4) 

(5) 

( 6 ) 



Here the prime denotes differentiation with respect to 5. An expression for 5 
is obtained by integrating the 5' equation and using the boundary condition 
5(0) = 0, 

5 = /' + y • (7) 

In practice, i?e-uniform numerical approximations to the derivatives of the 
self-similar solution are required. Expressions for these derivatives which are of 
order one as i?e — > 00 are given in equations (8) - (11) below. It should be noted 
that scaling du/dy by -s/ite is necessary to ensure finite values for large values 
of Re. 



du dv 

dx dy 

1 du 
\/Re dy 

dAp 

dx 

dAp 

dy 



f(5) 


( 8 ) 


xf(g) 


(9) 


X 


( 10 ) 


g'(g)/'/^ . 


( 11 ) 



Note that dvjdx = 0. 

In the next section, a numerical method for the solution of the third order 
quasilinear equation for / in (Ft) is described. The numerical approximations 
to / are then used to calculate approximations to g' and 5. 
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3 Robust Layer-Resolving Numerical Method 

To obtain similarity solutions for all values Re G [l,oo), solutions of the trans- 
formed problem (Pt) must be found for all values rj € (0,oo). In Farrell et 
al. [3], a problem that is similar to (Pr) is solved numerically over a finite inter- 
val rj G (0,P) and extrapolation formulae are used for rj > L. The appropriate 
choice of L is related to the numerical width of the boundary layer and is found 
by studying the singularly perturbed nature of the / equation. In the present 
case it is given by P = L{N) = InA^ where N is the discretisation parameter 
of the transformed problem. The following extrapolation formulae for /, g and 
their derivatives are derived using the asymptotic nature of the solution of (Pt) 



fiv) = V-L + f{L) 


for all r] > L 


(12) 


f'iv) = 1 


for all rj > L 


(13) 


f'iv) = 0 


for all r] > L 


(14) 


l + ^{v-L + f{L)f 


for all g > L 


(15) 


g'iv) = V-L + f{L) 


for all g> L . 


(16) 



The transformed problem (Pt) is discretised by replacing derivatives in the / 
and g equations by finite differeirce operators, which are defined on a uniform 
mesh in (0, L). The mesh is given by 

'^u={Vi- r]i = iN~^lnN, 0 < i < N} . (17) 

An iterative method is required for the solution of the quasilinear / equation. 
Here the following continuation algorithm, analogous to that described in [3], is 
used 



«) 



' For each integer m, 1 < m < M, find P™ on 
such that for all rji G , 2 < i < N — 1 
< h2(P)-p™) -bP™-ip+(p-p™) - (P»-P™-1)(P-P™) = -1 (18) 

^’"(O) = P+P™(0) = 0 and D°F^{gN-i) = 1 
_ with starting values for all mesh points G of F^{gi) = , 



where P(? 7 i) = F^{r]i), and for any mesh function <l>i = (P{r]i) 



= 



{D+-D-)^, 

(?7*+i - r?i-i)/2’ 



D+<^, = 



‘^^+l - 

Vi+1 - Vi 



= 






i-l 

Vi - Vi-1 



Once F is obtained the values of G and DG are calculated using the formulae 
G(77,) = P+P(r?,) + ip"(r?.) (19) 

DG{rj,) = D+D+F{g,) + F{g,) D+F{rj,) (20) 
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and numerical approximations to the self-similar solutions are found using equa- 
tions (2) - (4) and (8) - (11). 

4 Results and Discussion 

In this section the errors in the numerical approximations to the self-similar 
solution and its derivatives are determined on the rectangular domain 17 = 
(0, Lx) X (0, Ly) in the x — y plane, where Lx and Ly are independent of Re. The 
global error E in the numerical approximation to a function (j), on the closed 
domain 17 is defined by 

E{^) = \\^-(j)\\jj= ma,x_\^{x,y) - 4>{x,y)\ , (21) 

(x,y)GQ 

where <P denotes the piecewise linear interpolant of <P to each point of 17. 

— N 

Consider first the global error in U , the numerical approximation to u 

—N , 

calculated on the mesh / with iV subintervals. Recalling that rj = y^/Re, by 



(2) for each value of Re and N the error can be written as 
eIau"") = wu"" - u|b = l|x - f) lb 

( 22 ) 

— N 

This means that the global error in U can be found directly from the global 

N , 

error in Z7+F evaluated for r] < Ly^J Re. Analogous formulae for the global 
errors in the remaining numerical approximations are 

W"" ~^\\n= - f\\y<L,VRe (23) 

= h ■ 1 ^"' ■ ^ 

VRe (25) 

bk VHS (26) 

11^"^ -%\\n= - /'ll,<L„ (27) 

bk , (28) 

where d is the forward difference 17+ transformed to the x—y plane. By definition 

( 29 ) 
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( 30 ) 

TV N 

as dvfdx = dxV = 0 and dApJdx = dxAP = x. In the results that follow, 
and Ly are equal to one and the numerical approximations have been constructed 
for a range of values of Re S Rrb = {2^}j=o...20 and N € Rjy = {2-^}j=9...ig. 

— N 

Global errors in U are presented in Table 1 for various values of Re G Rrb 
and N G Rn- As the exact solution component u in closed form is unknown for 

— N 

this problem, we replace it in the formula for by the numerical approxima- 
tion calculated on the finest available mesh, where = 524288. For 

N = 512, 2048, 8192, 32768, the i?e-uniform global error 

TV TV 

E = niax (31) 

Re^RRe 

— N . . . — N 

is shown in the last row of the table. E is the maximum global error in U for 
a particular N and all available values of Re. It is seen that its values decrease 
rapidly as N increases which implies that an error bound of the form CpN~^ 

can be found, where Cp and p are independent of Re. This demonstrates compu- 
— N 

tationally that U converges i?e-uniformly to u. Analogous results are obtained 
for each of the other components. 



/V N N N 

Table 1. Ep^^{U ) and E {U ) 



for various values of Re G Rrs and N G Rn 



Re\N 


512 


2048 


8192 


32768 


2° 


1.676 X 10"°"^ 


5.162 X 10"°® 


1.506 X 10"°® 


4.099 X 10“°® 


2^ 


1.676 X 10"°"^ 


5.162 X 10"°® 


1.506 X 10"°® 


4.099 X 10“°® 


2^ 


1.786 X 10“°'^ 


5.337 X 10"°® 


1.543 X 10"°® 


4.187 X 10“°® 


220 


1.786 X 10“°'‘ 


5.337 X 10"°® 


1.543 X 10"°® 


4.187 X 10“°® 


E 


1.786 X 10"°"^ 


5.337 X 10"°® 


1.543 X 10"°® 


4.187 X 10“°® 



Realistic estimates of the i?e-uniform global error parameters Cp and p are 
found experimentally using the double mesh technique, a complete description of 

TV 

which is contained, for example, in [3]. First, for any mesh function defined 
on 12, the global two-mesh differences 



DRe{<^ ) 



—N —2N 
max_ ^ ^ 

(x,y)eO 



(32) 
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are calculated for each N satisfying N, 2N G Rn and Re G Rrs, and the -Re- 
uniform global two-mesh differences 

= max (33) 

are determined. The Re-uniform order of convergence is then taken to be the 
minimum value p of , where for each N satisfying N, 2N, AN G Rn 

= log2 



The Re-uniform error constant Cp is given by 



= max = max 



— N 

D NP 



NgRn 



ngRn 1 - 2~P 



(35) 



Values of and are presented in Table 2 for U. The corresponding 

values of p and Cp are 8.728 x 10“^ and 4.211 x 10“^ for all N > 2048. Analogous 
results are obtained for each of the other components. 



TV N N N _jsj N 

Table 2. ), D {U ) and p^^ {U ) for various values of RcGRre and 

N G Rn 



Re\N 


512 


2048 


8192 


32768 


2° 


7.856 X 10"°® 


2.400 X 10"°® 


7.145 X 10"°° 


2.080 X 10"°° 


2^ 


7.856 X 10"°® 


2.400 X 10"°® 


7.145 X 10"°° 


2.080 X 10"°° 


2" 


8.174 X 10"°® 


2.462 X 10"°® 


7.303 X 10"°° 


2.123 X 10"°° 


220 


8.174 X 10"°® 


2.462 X 10"°® 


7.303 X 10"°° 


2.123 X 10"°° 




8.174 X 10"°® 


2.462 X 10"°® 


7.303 X 10"°° 


2.123 X 10"°° 




8.641 X 10"°^ 


8.728 X 10"°^ 


8.876 X 10"°^ 


9.014 X 10"°^ 



The following computed error bounds for the self-similar solution and its 
derivatives, of the form CpN~P, are valid for all N > 2048, 

0.042 

||V^ 1.03 

\\AP^ - ApWjj < 0.487 AT-o-86 



(36) 

(37) 

(38) 
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11^"^ 0.042 (39) 

(40) 

0.042 iV-O-87 (41) 

(42) 



These error bounds are i?e-uniform and allow numerical approximations to the 
self-similar solutions to be calculated with controllable accuracy. For example, 
for U the upper bound on the error is 5.5 x 10“^ which implies that at 
least 4 digits in U are accurate provided that N is chosen so that N > 2048. 
Accuracy can be increased by increasing N. Error bounds valid for lower values 
of N can also be obtained. 

5 Conclusions 

In the case of plane stagnation point flow, i?e-uniform numerical approxima- 
tions to the self-similar solution and its derivatives have been generated. Error 
bounds for these components show that the numerical method is robust and 
layer-resolving, allowing numerical approximations of controllable accuracy to 
be computed independently of the value of Re. 
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Abstract. In this paper we study explicitly the pivot structure of the 
Hadamard matrices of order 16. We examine for each representative of 
the five equivalent classes the appearing pivot structures and we give 
tables summarising the attained 34 different structures. We give ten ex- 
amples of 16 X 16 Hadamard matrices all coming from Class I, for which 
when Gaussian Elimination with complete pivoting is applied on them, 
the fourth last pivot is 
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1 Introduction 



Let A be an n X n real matrix, let = A, and let , fc = 1, . . . , n — 1, be 

the {n — k) x (n — k) matrix derived from A by the Gaussian Elimination (GE). 
If we partition A^^'^ as 






(G Ak) 

LLf’ Q 



where the scalar is known as the pivot at the k-th stage of the elimination, 
then 



We say that a matrix A is completely pivoted (GP) or feasible if the rows and 
columns have been permuted so that Gaussian elimination with no pivoting 
satisfies the requirements for complete pivoting. Let g{n, A) = max |/|a^°^ | 

denote the growth factor of the Gaussian elimination on a GP n x n matrix A 
and g{n) = sup{ (/(n, A) }. The problem of determining g{n) for various values 
of n is called the growth problem. 

The determination of g{n) remains a challenging problem. Wilkinson in [9,10] 
noted that there were no known examples of matrices for which g{n) > n. 
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In [1] Cryer conjectured that ^^g(n,A) < n, with equality if and only if A is 
a Hadamard matrix”. In [6] a matrix of order 13 is given having growth larger 
than 13. Interesting results on the size of pivots appear when GE is applied to 
CP skew-Hadamard and weighing matrices of order n and weight n — 1 . In these 
matrices the growth is also large and experimentally it is believed that equals 
n-l[7]. 

An Hadamard matrix H of order n is an n x n matrix with elements ±1 and 
HH^ = nl. For more details and construction methods of Hadamard matrices 
we refer the reader to the book [5]. 

Since Wilkinson’s initial conjecture seems to be connected with Hadamard 
matrices it is important to study the growth problem for these matrices 
(see [1,2,8]). In the present paper we study the pivot structures that arises when 
we apply GE operations on CP Hadamard matrices of order 16. After testing at 
least 200000 Hadamard matrices, the following conjecture was posed: 

Conjecture (The growth conjecture for Hadamard matrices of or- 
der 16) 

Let A be an 16 X 16 CP Hadamard matrix. Reduce A by GE. Then 

1. g(16,A) = 16. 

2. The four last pivots are equal to ^ or 16- 

3. The fifth last pivot can take the values ^ or 

4. The sixth last pivot can take the values or 

5. The seventh last pivot can take the values or 

6. The eighth last pivot can take the values 2, |, |, or 

7. The first six pivots are equal to 1,2, 2, 4, 2 or 3, y or | or 4. 

8. The seventh pivot can take the values 2, 4, or 

9. The eighth pivot can take the values 4, |, or 

The equality in 1. above has been proved for a certain class of 16 x 16 
Hadamard matrices [2]. Cryer [1] has shown 2. for the three last pivots. Day 
and Peterson [2] have shown that the values ^ or appear in the fourth pivot 
when Gaussian Elimination ( not necessarily with complete pivoting ) is applied 
to a Hadamard matrix of order n. They posed the conjecture that when Gaus- 
sian elimination with complete pivoting is done on a Hadamard matrix the value 
of ^ is impossible for the fourth last pivot. In [3] a Hadamard matrix of order 16 
is given which has fourth last pivot We found 10 matrices of order 16 having 
as fourth last pivot 

The values in 7 are proved in [2] for the first five values, 1,2, 2, 4, 2 or 3, and 
experimental evidence in [8] and this paper strongly supports the next values 
and also the values in 5., 6., 8. and 9. 



Notation 1. We use — for —1 in matrices in this paper. 
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2 Pivot Structures for Hadamard Matrices of Order 16 

A Hadamard matrix H of order n is an n x n matrix of +l’s and — I’s such that 

H = nl 

This equation is equivalent to the assertion that any two rows of H are orthog- 
onal. Clearly, permuting rows or columns of H or multiplying rows or columns 
of il by — 1 leaves this property unchanged, and we consider such matrices 
equivalent. If Hi and H 2 are equivalent Hadamard matrices, then 

H2 = P- Hi -Q 

where P,Q are monomial permutation matrices of -|-l’s and —I’s. By this we 
mean that P and Q have exactly one nonzero entry in every row and in every 
column, and this nonzero entry is -1-1 or —1. P gives the permutation and change 
of sign of rows; Q of columns. Given a Hadamard matrix, we can always find one 
equivalent to it whose first row and first column consist entirely of -|-l’s. Such 
a Hadamard matrix is called “normalized”. Permuting rows except the first, or 
columns except the first, leaves a normalized matrix normalized, but in general 
there may be equivalent normalized matrices that are not equivalent by merely 
permuting rows and columns. 

When GECP is applied to equivalent matrices different pivot structures are 
attained. For Hadamard matrices of order 16 it is proved in [4] that there are 5 
equivalent classes and examples of each are given. 

In the sequel for each representative of each class we applied GEGP to it and 
we took 40000 equivalent matrices. For class I we found 9 different pivot patterns. 
For class II we found 18 different pivot patterns, for class HI we found 21 different 
pivot patterns whereas classes IV and V gave 12 different pivot patterns which 
were the same for both classes since classes IV and V are transpose to each 
other and thus are identical for the purpose of GEGP [2] The following tables 
summarizes the different pivot structures attained for each class that are also 
different among all classes. 

3 The Fourth Last Pivot 

The following matrices are GP Hadamard matrices. When Gaussian Elimination 
is applied on them they give the following pivot structure 

(1,2,2,4,3,|2,4, 4, 4, 4, 8,8,8,8,16). 

Thus they have their fourth last pivot equal to All of them belong to 
Glass I. The matrix in [3] which also gives as fourth last pivot 8 also belongs to 
Glass I. 
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Table 1. 





growth 


Class I- Pivot Pattern 


1 


16 


(1,2, 2, 4, 2, 4, 4, 8, 2,4, 4, 8,4,8,8,16) 


2 


16 


(1, 2, 2, 4, 2, 4, 4, ^ 4, ^ ^ , 4, 8, 8, 16) 


3 


16 


(1,2,2,4,2,4,4,4,§.4, 4, 8,4,8,8,16) 


I 


16 


(l,2,2,4,3,|,4,d^,|,4, 4, 8,4,8,8,16) 


■5 


16 


(l,2,2,4,3,|,4,g^,|,4,^,f,4,8,8,16) 




16 


(1,2,2,4,3,|,2, 4, 4 , 8 , A, ^,4,8,8, 16) 


7 


16 


(1,2, 2, 4, 3, 1,2, 4, 4,8, 4, 8,4,8,8,16) 


8 


16 


(1,2, 2, 4, 3, 2 2, 4, 4,4, 8, 8,4,8,8,16) 


9 


16 


(1,2, 2, 4, 3, 1,2, 4, 4,4, 4, 8,8,8,8,16) 



Table 2. 





growth 


Class II- Pivot Pattern 


1 


16 


(1, 2, 2, 4, 2, 4, 4, f , f , 4, 8, 8, 16) 


2 


16 


(1,2, 2, 4,2, 4, 4,^,^, 4, ^,^,4,8,8,16) 


3 


16 


(1,2, 2, 4, 2, 4, 4, 4, 4 , f , 4, 8, 8, 16) 


4 


16 


(1,2, 2, 4, 2, 4, 4, 4, 4, 4, ^,4,8,8,16) 


5 


16 


(1,2, 2, 4, 2, 4, 4, 4, 4, 4, 4, 8, 4,8,8,16) 


6 


16 


(1,2,2,4,2,4,4,^,^, 4, 4, 8,4,8,8,16) 


7 


16 


/I 9 9 0 lU 8 lb lb lb /I ft ft 1 fi') 

yi, z,, z, ^, 0 , 3 , j^Q /3 , 3 , , ^Q /3 , 3 , 0 , 0 , lu; 


8 


16 


(1,2,2, 4 , 3 ,^, f, 4,4,4, ^,^,4,8,8, 16) 


9 


16 


(1, 2, 2, 4, 3, f , , 4, f , 4, 4, 8 , 4, 8, 8, 16) 


10 


16 


(1, 2, 2, 4, 3, 4, 4, ^,4, 8 , 8, 16) 


11 


16 


(l,2,2,4,3,iii,f , 4,4,4,4,8,4,8,8,16) 


12 


16 


( 1 , 2 , 2 , 4, 3,^, 4,^, ^,4, 8 , 8,16) 


13 


16 


0 0 /] 0 lU lb lb lb lb lb lb /| 0 0 

yi, z, z, 0 , 3 , 3 , , 3 , ^g ^3 , ^Q /3 , 3 , 0 , 0 , lu; 


14 


16 


(1,2,2, 4 , 3 ,^, 4, 4, 8,4,8,8,16) 


15 


16 


(l,2,2,4,3,f ,^, 4, f, 4 ,^, f ,4,8,8,16) 



Table 3. 





growth 


Class III- Pivot Pattern 


1 


16 


(1,2,2, 4,2,4, 4, 4,|,^,^,f, 4,8,8,16) 


2 


16 


(1,2,2, 4,2,4, 4, |, 4 ,^, 4,8,8,16) 


3 


16 


(l,2,2,4,3,f,f,4,4,^,i4.f^4,8,8,16) 


4 


16 


(1, 2, 2, 4, 3, 1 , 4, 4, 4, 4, 8, 8, 16) 


5 


16 


(1,2,2, 4, 3,|, 4, 4, 4, 4,^, 4 , 4, 8 , 8,16) 


6 


16 


(1,2, 2, 4, 3, 1,4, 4, 4, 4, 4, 8, 4, 8, 8, 16) 


7 


16 


(1, 2, 2, 4, 3, 1 , 4, ^ , f , 4, ^ , f , 4, 8, 8, 16) 


8 


16 


0 0 A ^ ^ A zl ft ft 1 

z, z, 0 , 3 , ^g/3 , 3 , j^g/3 , ^Q/3 , 3 , 0 , 0 , TU; 


9 


16 


(1, 2 , 2 , 4, 3 , 1 , 4, ^ , f , 4, 4, 8, 4, 8, 8, 16) 


10 


16 


(1,2,2,4,3,|,4,4, |, j^,^,^,4,8,8,16) 
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" iiiiii - ii - iiiii - 

1 - 1-1 1 1 

11 111 11 11 - 

1 1 1-1 1 - 

111 i - i - i-i 11 

i - i - i-i 1 - 1 - 1 - 

-1 1-1 1 1 1 - 1 - 

1 1111111-1 

1 1-1 1 1 1 1 

11-11 1-1 11-1 

11 - 1 - 11-1 1-11 

-1 1 1 1-1 1 1 

- 1-1 1 1 1 1 1 

1 1 11111-11 

11111 1-11111 

_i 1 1 1 1 1 1 1-1 

" 1-1 1 1 1 1 1 

111 - 1111-1111 

1 1 - 111111-1 

- 111111-11 1 

- 1 - 1-1 1-1 1 1 

- 1 - 11 - 111-111 

1-11111 1-1 1 

11111 111111 - 

- 11 - 1 - 11-1 1-11 

1-1 1-1 1 1 1 

1 111 - 11 - 11 - 1 - 

1 1 1 1 1 1-1 

1111-1111 11111 

1 11 - 11 - 1 - 1 - 1-1 

111 - 11-11111 1 

_ 1 1 1 1 1 1 1 

r — 1 1-1 1 1 1 1 - 



- i - i - i-i 1 1 1 1 1 

11111 - 11 - 1 - 1 - 11 - 

1 — 1-1 1 1 1 — 1 - 

1 1 1 1 — 1 1 1 

111-11 — 1 — 11 - 1 - 
1111 - 11 - 111-1111 

1 1 1-1 1 1 1 — 

-1 — 1111 - 11 - 1 - 1 - 

1 111 - 11111 - 

1-1 — 111 - 1-11 — 1 

1 1 1 1 1 1 1 

1 — 11 — 111 — 1-11 
1 - 1111-1 — 1-11 — 

1 1-1 — 1 1 

_ — 1 1 — 1 1 i _ 

" 111111111111-1 — 

1-1 1-1 — 1-1 1 - 

- 11 - 111 - 1 - 111-11 



— 1 1 1 1 1 1 1 — 

1 1-1 — 1 1 1 1 

1 1 - 11 - 1111-1 

1 1 1 1 1 1 1 

1 — 1 1 1 1-1 1 

11 - 1-111 11 - 1 - 

1-111 — 11 - 1 - 1 - 1 - 

1 1 1 — 1 — 1 1 — 1 

_i — 1 — 1 1 — 1 

I - 1-1 i - i - i-i — 1 1 

1-1 — 1 1 1 1 — 1 - 

II - 11 11-11111 

1 1 — 1 1 1 — 1 1 - 



ri-iiiii — 111 — 11 



111 - 1 - 11 - 11-1 

1 1 1 1 1 1 1 - 

— 1 1 — 1 1-1 

1 — 1 - 11-1 — 111-1 
-11 — 111 — 111111 
11-1 — 11111 — 111 
11 — 1-1 — 11111 — 

1111 - 1 - 1 - 1 - 1-1 — 

111-11 — 11 — 1111 

1-1 1 1-1 1 1 - 

1 — 1 1 1 1-1 1 - 

- 1 - 1-1 1 — 1 — 1 - 1 - 

- 1-1 1 1-1 1 1 

1 1 1 1 1 

1 1 - 1-1 1-1 1 



" 1111-11 — 111-1 — 
1 - 1 - 1-11 — 11111 - 

1 — 1-1 1 1 1 - 

1 1 1 1 1 1 — 1 1 

11 - 11111111 - 1 - 1 - 

— 1-1 1 1 — 1 1 - 

1-1 1 1 1-1 1 — 1 

1 11 — 1111-111 

1 1 — 1 — 1-1 1 — 

1 1 1 1 1-1 1 1 — 

111 - 111-1 11-1 

- 1-1 1 - 1-1 — 1-1 1 - 

-1 — 1 1 1 1 1 

1 1 1 1 1-1 1 - 1 - 

1 — 1 1-1 — 1-1 1 — 1 
_- 11111 - 1 - 1-11111 
riii-i-i-iii-iiii 



1-1 1 1 1 1 1 — 1 - 

1 1 - 1-1 1 1 - 1 - 1 - 

1 1-1 1 

- 1 - 1111-1111 1 

1 - 1 - 11111-111 

1 1 1 1 1 1 

1 — 1 — 11 - 11111-1 

1-1 1-1 1-1 1 1 — 

1 1 1 — 1 1-1 1 — 1 

— 1 — 1 1 1-1 1 1 1 

11 — 1111 - 1 - 1 - 11 - 
1 — 111-111 — 1-11 
1 1 1 1-1 1 — 



1-1 1-1 1 - 1 - 

_i 1 — 1-1 1-1 1 1 

"1 1 1 — 1 1 - 1-1 1 1 

1 — 1-1 1 - 1-1 1 

1 — 1 1 1 1 1 1 

i _ i_i 1 1 

11-11111 1-1 

— 1-111 — 11 — 111 

111 11 — 111111 

1 - 1111 - 1 - 1 - 1111 - 

1-11 — 1111 11 

111 - 11 - 11-1 1 - 

1 1 — 1-1 — 1 — 1 - 1 - 
1 — 11 - 1 - 1 - 11 - 11 - 
- 1-11 — 1-111 — 11 

-1 1 1 1 1 - 

11111 111 - 11-1 

- 111111-1 — 11-11 
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■- 1111111-1111 



1 - 111 - 11-1 11 

111-1111 1-1 

_i 111111-11 

1 1 1-1 i-i-i-i-i 

Ill 1 11111 

11-111 1 11 1 

1 11 - 1 - 1111111 - 

1111-1-111-1-11- 

1-11 111-11 1 

- 1 - 1-11 11-111 

- 1-1 1-1 1 1 1 - 

-1 1-1 1-1 1-1 

1 1-1 

-111 1-11 11-1 

1 1 1 1 1 1 1 



"i-i-i 1 1 1 1 1 1 1 ■ 

1111 1111 - 1-111 

11 11-1111 

-1 1 1 - 1-1 1 

1 1 1-1 1 1 1-1 

1 1 1 1-1 1 1 1 

-1 1-1 1 1 1-1 1 

111 111 111 - 1 - 

- 11 - 11-11 1111 ’ 

1 1 1 1 1-1 1 1 - 

1-1 1 1 1 - 1 - 

111111 11111 1 

1 i-i-i-i 1 

1 - 1-1 1111111 

1-1 1 1-1 1 1 1 

1 1 1-1 1-1 



4 Conclusions 

Finally there were found at least 34 different pivot patterns. From the above 
results we see that the magnitudes of all the intermediate pivot elements are less 
than 16 and this gives strong evidence that the growth for the Hadamard matrix 
of order 16 is 16. 

It is interesting to study the pivot structures for each class. Class I gave 
always as sixth pivot 4 or | and the fourth last pivot equal to 8 arised only from 
matrices coming from the first class. Class II gave always as sixth pivot 4 or ^ 
whereas Class III gave always as sixth pivot 4 , or A thorough classification 
of the appearing pivot structures for each class still remains an open issue. 
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Abstract. Tikhonov regularization using SVD (Singular Value Decom- 
position) is an effective method for discrete ill-posed linear operator equa- 
tions. We propose a new regularization method using Rank Revealing 
QR Factorization which requires far less computational cost than that 
of SVD. It is important to choose regularization parameter to obtain a 
good approximate solution for the equation. For the choice of the regular- 
ization parameter. Generalized cross-validation (GCV) and the L-curve 
method are often used. We apply these two methods to the regularization 
using rank revealing QR factorization to produce a reasonable solution. 



1 Introduction 



We consider the approximate solution of linear discrete ill-posed problem 



Ax = b, A e m > n. 



( 1 ) 



where A is an ill-conditioned matrix. The equation (1) arises as discretizations 
of the Fredholm integral equations of the first kind : 



K{s,t)f{t)dt = 5 (s), 



s e s 



min 5 '^maxj ? 



where K{s, t) and g{s) are known L 2 functions and f(t) is the unknown function 
in L 2 [a,b\. Tikhonov regularization [2,5] is one of the practical methods for 
this problem. This method uses the singular value decomposition (SVD) of the 
coefficient matrix A : 



n 

A=U ^ aiUivJ , CTi > • • • > cr„ > 0, (2) 

i=l 



where ai are the singular values of A and Ui and v are the ith left and right 
singular vector, respectively. Using this decomposition, the regularized solution 
is given as follows : 



xx 



i=l 





+ A2’ 



(3) 
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where f\^i are called filter factors [5] which depend on the regularization param- 
eter A. 

In this paper, we propose a regularization method by a decomposition using 
rank revealing QR factorization (RRQR) [1,3], which requires less computational 
cost than that of SVD. 

The regularized solution depends on a regularization parameter which con- 
trols the influence of the noise in right-hand side b. Hence the choice of the 
regularization parameter is important to obtain a good approximate solution. 
Generalized cross-validation (GCV) [4] and the L-curve method [5] are methods 
to estimate the optimal regularization parameter for the Tikhonov regulariza- 
tion. We apply these two methods to the regularization using rank revealing QR 
factorization to determine a reasonable parameter. 



2 Regularization Method Using Rank Revealing QR 
Factorization 



In this section, we show the regularization method using a rank revealing QR 
factorization defined using machine precision fj, > 0 as follows : 

Definition 1. [3] Assume that a matrix A S IR™^" (m > n) has numerical 

rank r. If there exists a permutation II such that 



An = QR, 



Rll Ri2 
0 R22 ’ 



Rll e IT' 



and 

f^min ||^22||2 = 0{p), 

then the factorization An = QR is called a Rank Revealing QR factorization 
of A. 

Here, numerical rank r of a matrix A is defined as follows : 

Definition 2. [3] A matrix A G IR™^" (m > n) has numerical rank r if 



(7r > CTr+l = 0{fi). 

We consider the decomposition 

A=UDRV^, D = d\ag{di, . . . ,dr), di > • • • > > 0 (4) 

where 

U =[ui,...,Ur] elR™^", V =[vi,...,Vr]€TR^^'' 

are orthogonal matrices and R G IR’^^’^ is a well-conditioned matrix. This de- 
composition can be obtained using rank revealing QR factorization as following 
algorithm : 
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1 . Compute the rank revealing QR factorization of : 

A^n = [R, F] 

and let R ^ [-Rii, ^ 12 ]- 

2. Compute the decomposition 

if = LD, L e 

where L is a lower triangular matrix whose diagonal elements are 1 and D 
is a diagonal matrix. 

3. ni. 

4. Compute the QR factorization L = UR where R G is an upper trian- 

gular matrix. 

5. R^ D-^itD. 

Then the least squares solution of minimal norm aiLS for (1) can be given as 

Tl 

a^LS = W = VR~^ = [wi,. . . ,Wr\. (5) 

i—1 * 

We define the regularized solution using the decomposition (4) as follows : 

= <6) 

2 = 1 * 

where f\^i are the filter factors given by the regularization parameter A. The 
regularized solution (6), obviously, has the following characterization : 

xx — ^ a^LS as A — > 0, 

and this solution satisfies the following theorem. 

Theorem 1. Let x\ be as in (6). Then x\ is the unique solution of the mini- 
mization problem 

um\{\\Ax — h\\\ + f\\RV'^x\\\}, X = spanjui, . . . , u^}. (7) 

x^X 

3 Choice of the Regularization Parameter 

To obtain the regularized solution (6), the regularization parameter A has to 
be chosen properly. In this section, we apply the two methods, GCV and the 
L-curve method, to the regularization using RRQR. 



7?11 Ri2 
0 R22 
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3.1 GCV 



GCV [4] is a method to choose a regularization parameter which is the minimizer 
of the following function : 

rrxi = - b\\l 

^ ’ ~ (trace(/ - AB{X))Y ’ 

where B{\) is a matrix which satisfies x\ = B{X)b. Using the decomposition (4) 
and the filter factors (6), this function is given as follows : 



G(A) 






Ab=b- UU^b. 



(8) 



3.2 L-Curve Method 

L-curve [5] is given by plotting two values in the functional (7). This is the 
graph of — b|| 2 , ||i?U^a:A|| 2 ) for a large range of A. The L-curve has a 

corner and the corresponding regularization parameter is a good compromise 
between the residual norm and the influence of the noise in b. Here, we use the 
parameter which is the point of the L-curve with maximal curvature. Using the 
decomposition (4), the curvature k{\) of the L-curve is given as follows: 

^ ^ |a(A)/3(A)/r-A^(g(A) + Ap(A))| 

(g(A) + A4/3(A))l 

where 

a(A) = \\Axx - b\\l = ^ + \\^b\\l, 

/3(A) = WRV^xxWl = ^ 

i—1 ^ ^ ' 

and 

^(rf? + Ap- 

The costs for the estimation using these two methods are the same order as 
that of SVD. 



4 Numerical Results 

In this section, we test the linear operator equation obtained from the discretiza- 
tion of a Fredholm integral equation of the first kind : 

[ e"*f{t)dt= ^ 0<s<l, (10) 

Jo s + i 
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which has a solution f{t) = e*. For sample pints of s, we used 200 random 
numbers distributed uniformly and for the discretization of the integral form, 
we used a Gauss integral rule with 100 points. Thus, the size of A is (m, n) = 
(200, 100). In order to test influence of errors in the right-hand side b, we added a 
normal distribution with zero mean and standard deviation 10“^ to each element 
of b. 

We compare the properties of the regularization by RRQR with that of the 
Tikhonov’s method. The comparison of computational cost for each decomposi- 
tion of the coefficient matrix is shown in Table. 1. 



Table 1. CPU-time for the calculation of each decomposition 



Avarage time (sec) 

A = UEV^ 0.63 

A = UDRV'^ 0.06 



Here, the numerical rank of A is r = 9 where ^ = 1.0 x 10“^®. Diagonal 
elements of each diagonal matrix and the coefficients of b corresponding to the 
orthogonal basis Ui are shown in Fig. 1. 



SVD 



RRQR 



1e+02 
1e+00 
1e-02 
1e-04 
1e-06 
1e-08 
1e-10 
1e-12 
1e-14 
1e-16 

02468 10 02468 10 

i i 





Fig. 1. Diagonal elements and coefficients for the right-hand side 

As shown in Fig. 1, the diagonal elements and the corresponding coefficients 
of both methods have almost the same properties for every point i. 



4.1 GCV 



Here, we define the error function e(A) for the regularization parameter A as 
follows : 




lla^A - a^olU 
lla^olU 
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where Xg is the true solution of the equation. Fig. 2 shows the errors e(A) and 
the GCV function (8). The dashed-dotted lines in Fig. 2 are the estimations of 
the optimal regularization parameter given by GCV method. The regularized 
solution for each method with the estimated parameter are almost optimal. 



SVD RRQR 




A A 



Fig. 2. GCV function 



4.2 L- Curve 

The L-curves for both methods, — b|| 2 , ||Sa|| 2 ) and (H^^ca — 

b\\ 2 , ||.RV^a;A|| 2 ) respectively, are shown in Fig. 3. Both of the L-curves have a 
corner at the point of \\Ax\ — b \\2 « 1.0 x 10“^. 



II“’aII2 SVD RRQR 




The curvatures of the L-curves are shown in Fig. 4. The dashed-dotted lines 
in Fig. 4 are the regularization parameters with maximal curvatures. The pa- 
rameters given by the L-curve method are almost optimal. 

Table. 2 shows the comparison of the Optimal parameter, the parameter 
estimated by GCV and the parameter by L-curve and the corresponding errors 
for each regularization method. 
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SVD 



RRQR 





A A 

Fig. 4. Curvature of L-curve 
Table 2. Regularization parameters and errors 







Optimal 


GCV 


L-curve 


SVD 


A 


1.10 X 10"^ 


9.77 X 10"^ 


1.29 X 10"'' 




e(A) 


9.27 X lO"'^ 


9.37 X lO-'^ 


9.48 X 10"“ 


RRQR 


A 


2.19 X 10"^ 


1.10 X lO”"" 


2.34 X 10"^ 




e(A) 


2.50 X 10"'‘ 


3.90 X lO"'* 


6.86 X 10"“ 



5 Conclusions 

In this paper, we proposed a regularization method for discrete ill-posed linear 
operator equations using rank revealing QR factorization. The decomposition 
of the coefficient matrix requires far less computational cost than that of SVD. 
This method needs a good value of regularization parameter to obtain a good 
approximate solution. We applied the two methods, generalized cross-validation 
and the L-curve method, to obtain good estimations of the optimal regularization 
parameters. The costs for the choice of the parameters are the same order as 
that of Tikhonov’s method. In the numerical example of the Fredholm integral 
equation of the first kind, we have shown that the errors of the regularized 
solution are almost the same as that of Tikhonov regularized solution. 
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Abstract. This paper studies the application of preconditioned con- 
jugate gradient methods in high-resolution color image reconstruction 
problems. The high-resolution color images are reconstructed from mul- 
tiple undersampled, shifted, degraded color frames with subpixel dis- 
placements. The resulting degradation matrices are spatially variant. To 
capture the changes of reflectivity across color channels, the weighted Hi 
regularization functional is used in the Tikhonov regularization. The 
Neumann boundary condition is also employed to reduce the bound- 
ary artifacts. The preconditioners are derived by taking the cosine trans- 
form approximation of the degradation matrices. Numerical examples are 
given to illustrate the fast convergence of the preconditioned conjugate 
gradient method. 



1 Introduction 

In this paper, we consider the reconstruction of high-resolution color images 
from multiple undersampled, shifted, degraded and noisy color images which 
are obtained by using multiple identical color image sensors shifted from each 
other by subpixel displacements. We remark that color can be regarded as a 
set of three images in their primary color components: red, green and blue. The 
reconstruction of high-resolution color images can be modeled as solving 

g = Af + r], ( 1 ) 

where A is the reconstruction matrix, 77 represents unknown Gaussian noise or 
measurement errors, g is the observed high resolution color image formed from 
the low resolution color images and / is the desired high resolution color image. 

* Research supported by Hong Kong Research Grants Council Grant No. HKU 
7147/99P and HKU CRCG Grant No. 10202720. 

** Research supported in part by Hong Kong Research Grants Council Grant No. 
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The observed and original color images can be expressed as 




where and {i € {r, g, 6}) are the observed and the original color images 
from the red, green and blue channels respectively. The multichannel degradation 
matrix A is given by 

/ A'''' \ 

A = I As^ A39 A9^ . (2) 

\ A'^^ A^a ) 

Here the matrices and A^a (j ^ j) represent the within-channel and the cross- 
channel degradation matrices respectively. We remark that this formulation of 
multichannel degradation was considered in [4] . 

In the case of grey-level high-resolution image reconstruction, where the 
model was proposed in [2], we have already developed a fast algorithm that 
is based on the preconditioned conjugate gradient method with cosine transform 
preconditioners, see [8]. In particular, we have shown that when the L 2 or Hi 
norm regularization functional is used, the spectra of the preconditioned nor- 
mal systems are clustered around 1 and hence the conjugate gradient method 
converges very quickly. For grey- level images, the use of the Neumann boundary 
condition can reduce the boundary artifacts and we have shown that solving 
such systems is much faster than solving those with zero and periodic boundary 
conditions, see [8]. In the literature, the Neumann boundary condition has also 
been studied in image restoration [7,1,6]. 

The main aim of this paper is to extend our results in [8] from grey-level 
images to color images which are vector- valued grey-level images. We will extend 
our fast and stable gray-level image processing algorithm with cosine transform 
preconditioners to the color image reconstruction problem. 

The outline of the paper is as follows. In Section 2, we give a mathematical 
formulation of the problem. In Section 3, we consider the image reconstruction 
problem when there are no errors in the subpixel displacements. An introduction 
on the cosine transform preconditioners will be given in Section 4. In Section 5, 
numerical results are presented to demonstrate the effectiveness of our method. 



2 The Mathematical Model 

We begin with a brief introduction of the mathematical model in high-resolution 
image reconstruction. Details can be found in [2,10]. 

Consider a sensor array with L\ x L 2 sensors, each sensor has A^i x N 2 
sensing elements (pixels) and the size of each sensing element is Ti x T 2 . Our 
aim is to reconstruct an image of resolution Mi x M 2 , where Mi = L\ x. Ni 
and M 2 = L 2 X iV 2 . To maintain the aspect ratio of the reconstructed image, we 



A Fast Algorithm for High-Resolution Color Image Reconstruction 617 



consider the case where L\ = L 2 = L only. For simplicity, we assume that L is 
an even number in the following discussion. 

In order to have enough information to resolve the high-resolution image, 
there are subpixel displacements between the sensors. In the ideal case, the 
sensors are shifted from each other by a value proportional to T\/L x T 2 /A. 
However, in practice there can be small perturbations around these ideal subpixel 
locations due to imperfection of the mechanical imaging system. Thus, for l\,l 2 = 
0, 1, • • • , L — 1 with (/i, Z 2 ) yf (0, 0), the horizontal and vertical displacements df^i^ 
and of the [li, / 2 ]-th sensor array with respect to the [0, 0]-th reference sensor 
array are given by 






< 1 , = 






Here and denote respectively the normalized horizontal and vertical 
displacement errors. 

We remark that the parameters and can be obtained by manufac- 
turers during camera calibration. We assume that 

Kh\<\ and < i, 0<hj2<L-l. (3) 



For if not, the low resolution images observed from two different sensor arrays 
will be overlapped so much that the reconstruction of the high resolution image 
is rendered impossible. 

Let and be the original scene in red, green and blue channels 

respectively. Then the observed low resolution image in the t-th (i S {r, g, 6}) 
channel for the {h, Z 2 )-th sensor is modeled by: 






rT2{ri2 + ^)+d‘f , 






Wi 



je{r,g,b} 



lT2{n2-k)+d^,,^ dTi(™i-i)+d- 



f^^\xi,X2)dxidX2 






(4) 



(i) 

for m = 1, . . . , and ri 2 = 1, . ■ . , N 2 - Here is the noise corresponding 
to the (Zi,/ 2 )-th sensor in the t-th channel, and wu and Wij (i yf j) are the 
within-channel and the cross-channel degradation parameters. We note that 



Wij>Q, and ^ Wij = I, i&{r,g,h}. (5) 

j=r,g,b 



Details about these degradation parameters can be found in the multichannel 
restoration model [4] . 

To get the matrix representation (1), we intersperse the low resolution im- 
ages 5 ;*)^[ni,ri 2 ] to form an Mi x M 2 image by assigning 

- 1) + h,L{n 2 - 1) + ^ 2 ] = gl"^[ni,n 2 ], i € {r,g,b}. 
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The image so formed is called the observed high-resolution image from 
the i-th channel. Similarly, we define Using a column by column ordering 
for and (4) becomes 

je{r,g,b} 

Writing it in matrix form, we get (1) with (2) given by 

= w,jHL{,e), *,je{r, 5 ,&}. (6) 



Under the Neumann boundary condition assumption, the degradation matrix 
corresponding to the (?i, / 2 )-th sensor is given by 

Here and are banded Toeplitz-plus-Hankel matrices: 

/ 1 1 0 \ 






■ . UX + 

'^hh 



h^~ 

^hl2 



V 0 






/ 1 ••• 1 Kl2 



1 



1 ■■■ 

h 



'^h 



1 ... 1 / 



V 0 



hiX 1 



0 \ 



UX+ 

'^hh 



1 / 



( 7 ) 

and is defined similarly. 

The degradation matrix for the whole sensor array is made up of degradation 
matrices from each sensor: 



L-l L-l 

Hhie) = X! X! ^hhHhhie), ij G {r,b,g}. 

I2—O 



(8) 



Here are diagonal matrices with diagonal elements equal to 1 if the cor- 
responding component of the observed low resolution image comes from the 
(Zi,/ 2 )-th sensor and zero otherwise, see [2] for more details. From (6), we see 
that we have the same matrix Hl{c) within the channels and across the channel. 
Therefore by (2), the overall degradation matrix is given by 



Ahie) 



( Wj“p y^rg y^rb \ 

Wgr Wgg W gb 0 iJL(e) = W (g> i?i(e). 

Wbr Wbg Wbb ) 



(9) 



In the next subsection, we will show that Al{c) is ill-conditioned. 



A Fast Algorithm for High-Resolution Color Image Reconstruction 619 



2.1 Ill-Conditioning of the Degradation Matrices 

When there are no subpixel displacement errors, i.e., when all ef_^ = 0, 

the matrices and also Hf ^ (0) are the same for all li and h- We will 

denote them simply by Hf^ and 

In this particular case, the eigenvalues oi Hl = can be computed 

easily as the matrix can be diagonalized by the 2-dimensional cosine trans- 
form Cmi ® Cm 2 [8]. 



Lemma 1. [8, Theorem 1] Under the Neumann boundary condition, the eigen- 
values of Hl are given by 



= (^0 COS^ 



(» - 1)7 

2Mi 



■ COS 



O' - 1)^ 

2 M 2 



■PL 



0-1)7 

Ml 



■PL 



0 - 1)7 

M, 



for I < i < Ml ,l < j < M 2 ■ Here 

' L/4 



PL 



0 - 1)7 

Ml 



= 



E 

k=l 

1 

2 



cos 



{i — l)(2fc — 1)7T 



Ml 



(L-2)/4 

E 



cos 



{i — l)2fc7T 
Ml 



when L = 4:1, 
otherwise. 



In particular, by choosing i = Mi with j = M 2 and * = j = 1, we have 

1 



0 < X^UHl) < O 



MfMf 



and X^aai^i^H if) — 1. 



( 10 ) 



( 11 ) 



In practical applications, see [4], the within-channel degradation is always 
stronger than the cross-channel degradation, i.e.. 



Wii>Wij, for j yf *, and i&{r,g,b}. (12) 

Under this assumption and using (5), we can prove that W is nonsingular. 

Lemma 2. Let W be a matrix with entries satisfying (5) and (12). Then W is 
nonsingular. Moreover, we have 

0<6= X^in{W*W} < A,nax{TU‘W} < 2, (13) 



where d is a positive constant independent of Mi and M 2 . 

Proof. By (5), it is easy to show that 1 is an eigenvalue of W with corresponding 
eigenvector [1, 1, 1]*. Since the coefficients of the characteristic polynomial of W 
are real, the other two eigenvalues of W are in a conjugate pair. Suppose that W 
is a singular matrix, then W must be a rank one matrix, i.e., 

W = [UI,U2,U3Y[VI,V2,V3], 
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for some Ui,Vi. By (5), we can choose all Ui,Vi > 0. Also by (5), we have 

Ui{vi +V2+ = U2{vi +V2 + Vs) = Us{vi + z;2 + ^3) = 1 

or Ml = U 2 = Us = \/{v\ + V 2 + Vs). It implies that iVij = Vj/{v\ + V 2 + 
Vs). However, this contradicts assumption (12) and hence W is nonsingular. In 
particular, we have the first inequality in (13). Since all the entries Wij of W are 
independent of Mi and M 2 , we see that S is also independent of Mi and M 2 . 
By (5) and (12), we have ||M^||i < 2 and ||bF||oo = 1- It follows that ||Vh ||2 < 
l|H^lli||W^lloo<2. 

Combining Lemmas 1 and 2 and using the tensor product structure (9) of Aj;,, 
we get its condition number. 

Theorem 1. Let W he a matrix with entries satisfying (5) and (12). Under the 
Neumann boundary eondition, if is nonsingular, then the eondition number 
k{Al) of Al satisfies 

k(Al) > 0{MfMl). 

According to Theorem 1, Al can be very ill-conditioned or singular. For 
example, when L = 4 and Mi = M 2 = 64, Xss{Hl) = 0. By continuity argu- 
ments, Ai(e) will still be ill-conditioned if the displacement errors are small. 
Therefore, a regularization procedure should be imposed to obtain a reasonable 
estimate of the original image. 



2.2 Regularization 



In the case of grey-level image reconstruction, the regularization operator only 
needs to enforce the spatial smoothness of the image. The most usual form of 
this operator is the discrete version of the 2-dimensional Laplacian. However, in 
color image reconstruction, in addition to the within-channel spatial smoothness, 
the cross-channel smoothness must also be enforced. One may incorporate the 3- 
dimensional discrete Laplacian here. However, color planes are highly correlated 
and this operator may fail to capture the cross-channel similarities, see [4] . 

In [4] , Galatsanos et al. have proposed the following weighted discrete Lapla- 
cian matrix R as the regularization matrix: 



[Rf]r,Lk = 6[f^%,k - [f^%-l,k - [f^%+l,k - [f^%,k-l - [f^%,k+l - 









- [f^%-i,k - [f^%+l,k - [f^^^kk-1 - [k^\k+l - 



\m 



mu 
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and 



[Rfk,,k = - m-i,k - - m,k+i - 









for 1 < j < Ml and 1 < fc < M2. Here ||/^’'^||2, ||/^®^||2 and ||/^^^||2 are the 
estimates of the ||/^’’^||2, ||/^®^||2 and ||/^^^||2 respectively and are assumed to 
be nonzero. The cross-channel weights of this regularization matrix capture the 
changes of reflectivity across the channels. In practice, we set ||/*-*^||2 = ||5*'*^||2 
for i G {r,g,b}, where is the observed image, see [4]. 

To sum up, the regularization matrix R is given by 



R = 



2 

11/^°^ I 
ll/M| 






V ii/» 



ii/‘»di 

2 

ii-p’ib 

ll/<»>l|2 



I !/'’’> 1 12 
Il/^°^ll2 
ll/('’>l|2 
2 



)/-hJ0Z\ = S'(g)/-h7(g)Zl, 



(14) 



where A is the 2-dimensional discrete Laplacian matrix with the Neumann 
boundary condition. We note that A can be diagonalized by the 2-dimensional 
cosine transform matrix Cmi <8 Cm^ [8]. 

Using Tikhonov regularization, our problem becomes: 



(Ai(e)‘rAi(e) + R*R)f = ALiefTg, (15) 



where ^^(e) is given in (9), 

/ arl 0 0\ /ai.00\ 

r= 0 agl 0 = 0ag0 ®/=I?®/, 

y 0 0 abl / \ 0 0 ab / 

and ar, ag and ab are the regularization parameters which are assumed to be 
positive scalars. 

Next we show that the regularized system 

A^YAl + R*R = WnW (g) HIHl + {S(g)I + I(g) A)\S (g)I + I(g)A) (16) 



is well-conditioned. 



Theorem 2. Let W be a matrix with entries satisfying (5) and (12). Then there 
exists a positive scalar 7, independent of Mi and M 2 , such that 

X^i^iA^TAL + R^R} > 7 > 0. (17) 

Proof. Under the Neumann boundary condition, the matrices Hr and A are 
symmetric and can be diagonalized by Cmi ®Cm 2 - From (16), it therefore suffices 
to consider the smallest eigenvalue of the matrix 

W*L2W ® A^ + {S ® I + I ® S)\S ® I + I ® E) 



(18) 
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where A and U are diagonal matrices with diagonal entries given by the eigen- 
values of IIl and A respectively. More precisely, the diagonal entries Aij of A 
are given in (10) and the diagonal entries of U are given by 

= 4sin2 +4sin2 

for 1 < i < Ml and 1 < j < M 2 . 

By permutation, we see that the eigenvalues of the matrix in (18) are the 
same as the eigenvalues of 

B = A^(g) w^nw +{I^S + E(g) I)\I (g)S + E(g)I), (20) 



which is a block-diagonal matrix, i.e., all off-diagonal blocks are zero. It therefore 
suffices to estimate the smallest eigenvalues of the main diagonal blocks of B. 
For 1 < i < Ml, 1 < j < M 2 , the {{i— l)M 2 + j, {i — 1)M2+ j)-th main diagonal 
block of B is equal to 

B,j = Al ■ W*f2W +{S + r„/)‘(5 + 

where Aij is given by the expression in (10) and Sij by (19). Since Amin(-^+4") > 
Amin(-^) + Amin(A") for any Hermitian matrices X and Y (see [5, Corollary 8.1.3, 
p.411]), we have 

^min(Bij) > Amin(kF*CkF) -|- Amin{(5' + EijlY(S + EijI)}. 

By (13), 

K^in{W* QW) > min{ar, Og, ab} ■ Amin(kF*M^) = S min{ar, Og, ab} = <5q > 0, 

( 21 ) 

(22) 



(23) 

(24) 

(25) 



(26) 



where <5 q is a positive constant independent of Mi and M 2 . Hence 

Amin(Sig) > SqAI + Aniin{(5' + EijlY(S + EijI)}. 

In view of (10) and (19), we define for simplicity 



X(x, y) = 5o[A] cos^ X coiY y ■ p\ {2x) pi (2y ) , 



4>{x, y) = 4 sin^ x + 4, sin^ y. 



and 



Yi.x,y) = Amin {(-S' + (j}{x,y)lYiS + (j}{x,y)I)} . 
With these notations, (22) becomes 






{i - 1)7T (j - 1)7T 

2 Mi ’ 2M2 



■ Y 



{i - 1)7T {j - 1)7T 

2 Mi ’ 2M2 



with 1 < i < Ml A < i < M 2 . To complete the proof, we now show that 
x{x, y) + Y(x, y) >0 for all {x, y) € [0, 7t/2]2. 
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From (14), it is easy to check that the eigenvalues of S are 0, 3 and 3 and 
their corresponding eigenvectors are 



'||/(’'^l|2 \\M\2 


t 




t 

and 


1 

1 — 1 

0 

(N 

' 

1 

1 






L \\m\2 \ 




L Il/('”ll2 J 



respectively. Therefore, in view of definition (24), for all (x, y) G [0,7 t/2 ]^, the 
matrix S + </>(x, y)I is nonsingular except when x = y = 0. In particular, by def- 
inition (25), V'(x, y) > 0 for all (x, y) G [0, 7 t/ 2]^ except at x = y = 0. Moreover, 
since the entries ||/^’'^||2, ||/^®^||2 and ||/^^^||2 of S are constants independent 
on Ml and M2, 'ip(x,y) depends only on x, y, ||/^’'^||2, and ||/^'’^||2 but 

does not depend on Mi and M2. On the other hand, since cos^(x)p^(2x) >1/4 
at X = 0 and is nonnegative in [0,7 t/2 ], by (23), x(x,y) >0atx = y = 0 and 
nonnegative in [0,7 t/2 ]^. Therefore there exists a positive scalar 7 independent 
of Ml and M2 such that x(x, y) -I- 'ijj{x,y) > 7 > 0 for all (x,y) G [0,7 t/2 ]^. It 
follows from (26) that Amin(.Bij) > 7 > 0 for all 1 < i < Mi, 1 < j < M 2 - 

When there are errors in the subpixel displacements, the regularized matrix 
is given by 

AL{tyrAL{t) + R*R = W*f2W (g) HLiefHLie) + R^R. 

By using arguments similar to that in [8, Theorem 3], we can easily show that 
this regularized matrix is well-conditioned when the errors are sufficiently small: 

Corollary 1. Let e* = maxo< 7 ,i 2 <L_i{|e/^;^|, and W be a matrix with 

entries satisfying (5) and (12). If e* is suffieiently small, then the smallest eigen- 
value of flW (g) -\- R*R is uniformly bounded away from 0 by a 

positive constant independent of Mi and M 2 - 

3 Spatially Invariant Case 

When there are no subpixel displacement errors, i.e., when all = 0, 

we have to solve (A^TA^ -I- R*R)f = A\Tg which according to (16) can be 
simplified to 

{W*L2W + R^R)f = Hl)g. (27) 

Recall that if we use the Neumann boundary condition for both and A, then 
both matrices can be diagonalized by discrete cosine transform matrices. From 
(16) and (18), we see that (27) is equivalent to 

[W^QW ®A^ + {S®I + I® S)\S ®I + I® S)]f = {W*n ® A)g, (28) 

where f = {I ® Cmi ® Cms)/ and g = {I ® Cmi ® Cm2)9- The system in (28) is 
a block-diagonalized system of M1M2 decoupled subsystems. The vector / can 
be computed by solving a set of Mi M2 decoupled 3-by-3 matrix equations (cf. 
(20)). The total cost of solving the system is therefore of 0(MiM2 log M1M2) 
operations. 
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4 Spatially Variant Case 

When there are subpixel displacement errors, the matrix H^^e) has the same 
banded structure as that of Hl, but with some entries slightly perturbed. It is a 
near block-Toeplitz-Toeplitz-block matrix but it can no longer be diagonalized by 
the cosine transform matrix. Therefore we solve the linear system in (15) by the 
preconditioned conjugate gradient method. For an Mi x M\ block matrix 
with the size of each block equal to M 2 x M 2 , the cosine transform preconditioner 
c{Hl{€)) of H^ie) is defined to be the matrix (Cmi ® Cm 2 )^(C'mi <8 CM 2 ) that 
minimizes 

\\{Cmi ® Cm2)^{Cmi <8 CM 2 ) ~ ^i(e)l|F 

over all diagonal matrices <P, where || • Hj’ is the Frobenius norm, see [-3]. Clearly, 
the cost of computing c{HL{e))~^y for any vector y is 0 (MiM 2 log M 1 M 2 ) op- 
erations. Since Hi^{t) in (8) is a banded matrix with {L+ 1)^ non-zero diagonals 
and is of size M 1 M 2 x M 1 M 2 , the cost of constructing c{Hl{C)) is of 0{L'^ M 1 M 2 ) 
operations only, see [3]. 

We will employ the cosine transform preconditioner c{HL(e)) of H^{e) in our 
preconditioner. Thus we have to study the convergence rate of the conjugate 
gradient method for solving the preconditioned system 

[W^nW ® c(i7L(e))‘c(iJz,(e)) + R*R]-^[W^nW ® HL{e)*HLie) + R*R]f 
= [W*n^HL{eY]g. (29) 

By using the similar arguments as in [8] , we can show that the spectra of the pre- 
conditioned normal system are clustered around 1 for sufficiently small subpixel 
displacement errors. A detail proofs can be found in [9]. 

Theorem 3. Let e* = maxo<q,i 2 <i_i{|e^^;^|, and W be a matrix with 

entries satisfying (5) and (12). If e* is sufficiently small, then the spectra of the 
preconditioned matrices 

[W^CW ® c(i?L(e))‘c(i?L(e)) -k 0 HLie)*HL{e) + R*R] 

are clustered around 1 and their smallest eigenvalues are uniformly bounded away 
from 0 by a positive constant independent of M\ and M 2 ■ 

Using standard convergence analysis of the conjugate gradient method, see 
for instance [5, p.525], we conclude that the conjugate gradient method applied 
to the preconditioned system (29) will converge superlinearly for sufficiently 
small displacement errors. Since Hr{e) has only {L + 1)^ non-zero diagonals, 
the matrix-vector product Al{€)x can be done in 0{L^ MiM 2 ). Thus the cost 
per each PCG iteration is 0(MiM2 log M 1 M 2 -I- L^M\M 2 ) operations, see [5, 
p.529]. Hence the total cost for finding the high resolution image vector is of 
0 (MiM 2 log M 1 M 2 -I- L^MiM 2 ) operations. 
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5 Numerical Examples 

In this section, we illustrate the effectiveness of using cosine transform precon- 
ditioners for solving high resolution color image reconstruction problems. The 
conjugate gradient method is employed to solving the preconditioned system 
(29). The cross-channel weights for R (see (14)) are computed from the ob- 
served high-resolution image, i.e., ||/^*^||2 = for i G {r,g,h}. We tried 

the following two different degradation matrices to degrade the original color 
image 

/0.8 0.10.1\ / 0.5 0.3 0.2 \ 

(i) 0.10.8 0.1 ®iJL(e) and {ii) 0.25 0.5 0.25 0 i?i(e). (30) 

\0.1 0.1 0.8 ) \ 0.3 0.2 0.5 ) 

The interdependency between cross-channels of the first degradation matrix is 
higher than that of the second degradation matrix. Gaussian white noises with 
signal-to-noise ratio of 30dB were added to each degraded image plane. We 
remark that the second degradation matrix W (cf. (9)) has been used to test 
the least squares restoration of multichannel images [4]. 

In the tests, we used the same regularization parameter for each channel, i.e., 
Ur = a„ = (Xh = a. The initial guess was the zero vector and the stopping criteria 
was ||r^'’^|| 2 /||r *^°^||2 < 10“®, where is the normal equations residual after j 
iterations. Tables 1-4 show the numbers of iterations required for convergence for 
L = 2 and 4, i.e. , the number of sensor array used is 2 x 2 and 4x4 respectively. In 
the tables, “cos”, “dr” or “no” signify that the cosine transform preconditioner, 
the level-2 circulant preconditioner [3] or no preconditioner is used respectively. 

We see from the tables that for both degradation matrices, the cosine trans- 
form preconditioner converges much faster than the circulant preconditioners for 
different M, a and where M(= Mi = M 2 ) is the size of the reconstructed 
image and are the subpixel displacement errors. Also the convergence rate 
is independent of M for fixed a or ef'f . These results show that our method is 
very efficient. 

Restored color images using our method can be found on-line in [9]. One 
will see that the details in the image are much better reconstructed under the 
Neumann boundary condition than that under the zero and periodic bound- 
ary conditions. Moreover, the boundary artifacts under the Neumann boundary 
condition are less prominent too. 
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Table 2. Number of iterations for degradation matrix (ii) with L = 2 and 
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Table 3. Number of iterations for degradation matrix (i) with L = 4 and ef^i^ = 
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Table 4. Number of iterations for degradation matrix (ii) with L = 4 and 
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Abstract. We carry out a performance study on a single processing 
node of the HITACHI SR8000. Each processing node of the SR8000 is a 
shared memory parallel computer which is composed of eight scalar pro- 
cessors with a pseudo-vector processing facility. In this study, we imple- 
ment highly optimized codes for basic linear operations including matrix- 
matrix product, matrix-vector product and vector inner-product. As a 
practical application of matrix- vector product, we examine the perfor- 
mance of two iterative methods for linear systems: the conjugate gradient 
(CG) method and the conjugate residual (CR) method. 



1 Introduction 

The significance of a large-scale numerical computation is rapidly growing in var- 
ious scientific and technological fields such as structural analysis, fluid dynamics 
and quantum chemistry. In particular, high performance solvers for linear prob- 
lems are highly desired, since the problem is frequently turned into a linear 
system after a suitable discretization of space and time. The purpose of this 
study is to develop highly optimized linear operation codes on the HITACHI 
SR8000, which is one of up-to-date parallel supercomputers. We restrict our- 
selves to a single processing node in this paper. A single node of the SR8000 can 
be considered as a shared memory parallel computer which is composed of eight 
scalar processors with a pseudo- vector processing facility [1,2]. In this sense, the 
present work takes a complementary role to ATLAS (Automatically Tuned Lin- 
ear Algebra Software) in [3], which is intended mainly for RISC processors. After 
examining the size dependence of the performance of the tuned codes for some 
basic linear operations, we apply them to the conjugate gradient (CG) and the 
conjugate residual (CR) methods, which are typical iterative methods for linear 
systems. 

The paper is organized as follows. We summarize experimental environment 
in Sect. 2. In Sect. 3, we discuss tuning techniques for basic linear operations on 
the SR8000 and examine the performance of the tuned codes, which are applied 
to the CG and CR methods in Sect. 4. The current work is summarized in Sect. 5. 
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2 Experimental Environment 

We summarize experimental environment of this work and also give specifications 
of the HITACHI SR8000. Numerical experiments were performed at the Com- 
puter Centre Division, Information Technology Center, the University of Tokyo. 
The SR8000 is composed of 128 processing nodes interconnected through a three- 
dimensional hyper-crossbar network. The communication bandwidth available to 
each node is IGB/sec for a single direction. Each processing node is a shared 
memory parallel computer with eight scalar processors (Instruction Processor, 
IP) which is based on RISC architecture. Each IP has two multiply-add arith- 
metic units with machine cycle of 4nsec. As a result, the theoretical peak per- 
formance of each IP is IGFLOPS. The total theoretical peak performance of 
each processing node is 8GFLOPS. Each IP is designed to achieve a similar per- 
formance to a vector processor by adopting a pseudo-vector processing facility, 
which suppresses a delay caused by cache misses. 

We use a single processing node for numerical experiments. The programming 
language is FORTRAN??. The compile options are “-64 -nolimit -noscope 
-Oss -procnum=8 -pvfunc=3”. These options instruct the compiler to use 64-bit 
addressing mode (“-64”), to remove limits of memory and time for compilation 
(“-nolimit”), to forbid dividing a source code into multiple parts when it is 
compiled (“-noscope”), to set the optimize level to the highest (“-Oss”), to 
use 8 IP’s (“-procnum=8”) and to set the pseudo- vectorize level to the highest 
(“-pvfunc=3”), respectively. We also give the compiler a directive concerning a 
parallelization among IP’s, which is described in the subsequent sections. 

3 Basic Linear Operations 

In this section, we discuss the basic linear operations including vector operations, 
matrix-vector product and matrix-matrix product on a single processing node 
of the SR8000. 

We begin with the vector operations. Table 1 is a summary of four basic vec- 
tor operations which are often used to solve linear systems. In Table 1, x and y 
are real n-vectors, while a is a real scalar. The usual inner-product is denoted by 
(•,•); (x,y) = Eti XiUi- The first column shows the name of the corresponding 
subroutines in BLAS (Basic Linear Algebra Subprograms) [4,5,6,?], which is a 
standard library for basic linear operations. The BLAS routines are classified 
into three categories; Level 1 (Vector Operations), Level 2 (Matrix- Vector Op- 
erations) and Level 3 (Matrix-Matrix Operations). The operations in Table 1 
belong to Level 1 BLAS. 

In the following, we assume the vector length n to be a multiple of eight, 
namely the number of IP’s in a single node of the SR8000. The vector opera- 
tions in Table 1 can be written by using a single loop, which is pseudo- vectorized 
and parallelized with a block distribution in our implementation. For a paral- 
lelization of loops on a single node of the SR8000, there are two directives for 
data distribution to each processor. The directive “*P0PTI0N NOCYCLIC” to the 
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Table 1. List of basic vector 



operations 



Name 


Function 


daxpy 


y — y -1- ax 


ddot 


(x.y) 


dnrm2 


||x||2 := a/(x,x) 


dscal 


X := ax 




10 12 14 16 18 20 22 24 

Problem Size : k (n=2'4c) 



Fig. 1. Performance of basic vector op- 
erations 



compiler indicates a block distribution; If the loop index i runs from 1 to n, the 
operations for i = (k— l)n/8-|- 1, {k— l)n/8-|-2, • • • , kn/8 are performed on fc-th 
processor, (fc = 1, 2, • • • , 8). On the other hand, the directive “*P0PTI0N CYCLIC” 
indicates a cyclic distribution; The operations for i = k,k + 8,---,n+k — 8 are 
performed on fc-th processor, (fc= 1,2, •••,8). A cyclic distribution is useful in 
such a case that computational load is not balanced among IP’s with a block 
distribution due to, for example, a branching statement inside the loop. Other- 
wise, a block distribution is preferable to a cyclic distribution, because the latter 
requires an overhead to calculate loop-index. For the vector operations in Table 
1, a block distribution runs about 10% faster than a cyclic distribution. 

We leave the loop-unrolling to the compiler for the operations in Table 1. 
For a single loop, the compiler can recognize the optimum depth for the loop- 
unrolling, since it can be determined from the number of floating-point registers 
on each IP. We have checked through numerical experiment that the compiler 
indeed gives rise to the optimum loop-unrolling even without any explicit direc- 
tive to the compiler. (The directive “*S0PTI0N UNROLL (k)” indicates the loop- 
unrolling to a depth of fc.) 

Fig.l shows the performance of daxpy, ddot, dnrm2 and dscal. The horizon- 
tal axis is fc = log 2 n with the problem size n, while the vertical axis shows 
the performance in units of MFLOPS. One can see that the performance of 
the operations for a single vector (dnrm2 and dscal) is saturated at k = 17. 
This is because the data cache memory for each processing node is 128KB/IP 
X 8IP’s in the SR8000. For the operations for two vectors (daxpy and ddot), 
the saturation occurs around a half of the problem size; fc = 15 ~ 16. For each 
operation, the performance is kept at a high level even for a larger problem size, 
owing to a pseudo-vector facility. For fc = 24 (n = 2^"* = 16,777,216), the per- 
formance of daxpy, ddot, dnrm2 and dscal is 1755.6MFLOPS, 3359.8MFLOPS, 
5565.5MFLOPS and 1322.6MFLOPS, respectively. From the viewpoint of the 
arithmetic operations, the Euclidean norm dnrm2 is the same as the inner- 
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do 10 i=l,m-l,2 
dtmpl=0 . dO 
dtmp2=0 . dO 
do 20 j=l,n 

dtmpl=dtmpl+a(i ,j)*v(j) 
dtmp2=dtmp2+a(i+l , j ) *v ( j ) 

20 continue 

u(i )=beta*u(i )+alpha*dtmpl 
u(i+l)=beta*u(i+l)+alpha*dtmp2 
10 continue 

Fig. 2. Kernel code of dgemv for 
op(A) = A. Additional statements are 
required if m is odd 



do 10 i=l,n-l,2 
dtmpl=0 . dO 
dtmp2=0 . dO 
do 20 j=l,m 

dtmpl=dtmpl+a( j , i )*v(j) 
dtmp2=dtmp2+a( j , i+l)*v( j ) 

20 continue 

u(i )=beta*u(i )+alpha*dtmpl 
u ( i+ 1) =bet a*u ( i+ 1 ) +alpha*dtmp2 
10 continue 

Fig. 3. Kernel code of dgemv for 
op(A) = . Additional statements 

are required if n is odd 



product ddot. However, dnrm2 is about 1.6 times faster than ddot. This is due to 
the fact that the statement s := s + XiUi requires two load and one multiply-add 
operations. As a result, the performance of ddot is at most 4GFLOPS, namely 
50% of the peak performance of a single processing node. On the other hand, 
dnrm2 requires only one load operation for a single multiply-add operation. This 
explains the ratio of the performance between dnrm2 and ddot. One can also 
observe that ddot is almost twice faster than daxpy. This is because daxpy re- 
quires a store operation after two load and one multiply-add operations, which 
is unnecessary for ddot. Similarly, dscal requires a store operation after one load 
and one multiplication operations. Thus the ratio of arithmetic operations to 
data operations is the smallest in dscal. This is the reason why the score of dscal 
is the poorest in Fig.l. 

We proceed to the general matrix- vector product dgemv, u := a op(A)v-|-/3u 
in Level 2 BLAS. Here, a and j3 are real scalars, A is a real m x n matrix, u 
and V are real vectors, and op(H) is A or Clearly, the operations for each 
component of dgemv can be performed independently. Figs. 2 and 3 show kernel 
double loops for op(H) = A and op(A) = , respectively. We parallelize the 

outer loop with index i with a block distribution and also pseudo-vectorize the 
inner loop for vector inner-product. In the source codes in Figs. 2 and 3, we 
employ loop-unrolling to a depth of two for the outer loop. For the inner loop, 
we leave the unrolling to the compiler, as in vector operations. As a result, the 
inner loop is unrolled to a depth of four. The loop-unrolling to a depth of two 
for the outer loop makes it possible to reduce the number of load operations for 
the vector v to the half in the inner loop. As well, the length of the outer loop is 
reduced by 50%. As a result, the performance is improved by about 10%. We have 
examined unrolling of the outer loop to greater depth in numerical experiment. 
The performance for a depth of four is almost the same as for a depth of two, 
while a depth of eight gives rise to only 10% performance compared to the case 
of a depth of two. A depth of eight is too large to store relevant elements in the 
floating-point registers on each IP. 
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problem size : n 



Fig. 4. Performance of dgemv. Solid and broken lines show the performance for 
op(A) = A and op(A) = AA" , respectively 

Fig. 4 shows the performance of the dgemv routine. The matrix is assumed to 
be square (m = n). The size of the matrix is changed as n = 256 x i; i = 
1,2, •••,32. For n = 8192, the performance is 2797.1 MFLOPS and 3811.5 
MFLOPS for op(A) = A and op(A) = respectively. Since the matrix is 
stored by columns in FORTRAN, the memory access is continuous for op(A) = 

. Thus op(A) = is expected to show high performance, comparable to 
the inner product. Indeed, one can see that the broken line in Fig. 4 reproduces 
the performance of the inner product ddot in Fig.l. Recall that since the loop 
for vector operations in Fig.l is parallelized among eight IP’s, each IP processes 
only one eighth of the vector elements. Thus, Fig. 4 should be compared with 
the performance of ddot for fc = 12 ~ 16 in Fig.l. In case of op(A) = A, the 
memory access to the matrix A is not continuous, as seen from Fig. 2. This is 
the reason why op(A) = A is about 75% of op(A) = A^ in performance. The 
drawback might be settled by changing the order of outer and inner loops. In 
such case, however, the variable dtmpl as well as dtmp2 should be a vector in- 
stead of a scalar. This causes additional store operations and a substantial delay 
is observed; The asymptotic performance is about 2000MFLOPS. 

Finally, we consider the matrix-matrix product. The matrix-matrix product 
A = BC is given by aij = YJl^ihkCkj\ i,j = l,2,---,n. For simplicity, we 
assume that A = (aij), B = (bik) and C = (cfcj) are n x n square matrices. 
Clearly, the source code of matrix-matrix product is composed of a triply nested 
loop with indices i,j,k- Numerical experiment shows that the jki form together 
with the options described below gives rise to the best performance. Here the loop 
indices are ordered as j, k and i from the outer loop to the inner loop in the jki 
form. This corresponds to the linear combination algorithm. We parallelize the 
outermost loop with index j with a block distribution and also pseudo- vectorize 
the innermost loop with index i. 

Fig. 5 shows the performance of matrix-matrix product for problem size n = 
256 X i; i = 1,2, ■ ■ ■ , 32. One can see that the performance is extremely high. 
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Fig. 5. Performance of matrix-matrix Fig. 6. Performance of CG and CR 
product methods 



Indeed it is above 75% of the theoretical peak performance of a single processing 
node of the SR8000 for the problem size n ~ 500 ~ 4000; 6101.6MFLOPS for 
n = 512 and 7021.5MFLOPS for n = 3840. However, a further increase of vector 
length leads to a sudden slow down caused by cache misses; The performance is 
2695.5MFLOPS for n = 8192 for instance. The so-called block algorithms that 
utilize submatrices rather than just columns or rows are expected to remedy the 
situation. 



4 CG and CR Methods 

As an example of the basic linear codes in Sect. 3 at a practical level, we examine 
the performance of the CG and CR methods. We consider a linear system Ax = 
b. Here x is an unknown n- vector which should be determined when an n x 
n nonsingular matrix A and an n-vector b are given. Figs. 7 and 8 show the 
algorithms of the CG and CR methods, respectively. We assume the coefficient 
matrix A to be dense and we use in the numerical experiment the Frank matrix 
A = {uij) with aij = min{f,j}; i,j = l,2,---,n, which is widely used for 
benchmark. 

For implementation of the CG and CR methods on the SR8000, we use the 
optimized codes in the previous section. We transpose the matrix A and calculate 
the matrix-vector product (Ax)^, by This ensures continuous 

memory access to the matrix A^. Note that the transposed matrix A^ can be 
overwritten on the original A, since one uses A only in a form of the matrix- vector 
product Ax. We set the relative residual to e = 10“^^ in Figs. 7 and 8. 

Fig. 6 shows the performance of the CG and CR methods for the problem size 
n = 256 X i; i = 1, 2, • • • , 32. For n = 8192, the performance of the CG and CR 
methods are 3785.8MFLOPS and 3729.2MFLOPS, respectively. They are close 
to the asymptotic performance of dgemv subroutine, nearly 50% of the peak 
performance of a single processing node of the SR8000. If we do not transpose 
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Take an initial guess xq; 
ro := b — ^xo; po ;= ro; 
for fc ;= 0, 1,2, • • • 
until ||rfc|| < e||b|| do 

begin 



a 


= Apfc; 




= (Pfc,a); 


Ofc 


= (rfc,Pfc)/M; 


Xfc+i 




r*,+i 




Pk 


= -(rfc+i,a)/p 


Pfc +1 


= Tfe+i - 1 - PkPk; 



end 

Fig. 7. Algorithm of CG method 



Take an initial guess xq; 

ro b — Axo; po := ro; q := Apo; 

for fe — 0, 1, 2, • • • 

until ||rfc|| < £||b|| do 

begin 

M := (q,q); 

Ofc — (rfc,q)/^; 

Xfc+i := Xfe + akPk; 

Vk+i ■- Tfe - Oi,q; 
a := Arfc+i; 

Pk ■■= -(a,q)/^; 

Pfc+l := Tfe+i + PkPk\ 
q := a + /Ifcq; 

end 

Fig. 8. Algorithm of CR method 



the coefficient matrix and use dgemv with op(A) = A, the performance is at 
most 2700MFLOPS, which we have checked in experiment. 

We emphasize that it is not always expensive to use iterative methods for 
dense linear systems. If we apply the diagonal preconditioner (where the precon- 
ditioner is just the diagonal of A) to the Frank matrix in CG method, only 100 
~ 600 iterations are required in a wide range of the problem size 512 ~ 8192, 
where right-hand side b and initial guess xq are chosen as generic. In such cases, 
our tuned code for CG method obtains the solution considerably faster than 
LU decomposition, even if LU decomposition should attain the theoretical peak 
performance. This implies that highly optimized codes of iterative methods are 
useful even for dense linear systems, at least for some classes where a suitable 
preconditioner is known. 



5 Summary 

In this paper, we implemented the highly optimized codes for basic linear oper- 
ations on a single processing node of the HITACHI SR8000 and also evaluated 
their performance. Concerning the tuning techniques, we should select a suitable 
technique according to loop structure. For vector operations like Level 1 BLAS, 
which is described by a single loop, an adequate selection of data distribution 
by a directive to the compiler takes an important role to get good performance 
in parallel processing. By adopting a block distribution, we attained 25%, 50%, 
75% and 17% of the theoretical peak performance for daxpy, ddot, dnrm2 and 
dscal, respectively. On the other hand, the loop-unrolling of the outer loop is 
efficient for multiple nested loops such as the matrix-vector product (dgemv) and 
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matrix-matrix product. This is because one can save the load and store opera- 
tions in the inner loop. For matrix-vector product, we observed 35% of the peak 
performance. It is enhanced to nearly 50% by using the transposition of the 
matrix. This is because the use of the transposed matrix in matrix-vector prod- 
uct ensures continuous memory access to the matrix elements in FORTRAN. 
Concerning matrix-matrix product, we observed 85% of the peak performance. 
Although cache misses slow down the present code of matrix-matrix product, 
the block algorithms are expected to prevent cache misses. As a realistic appli- 
cation of basic linear codes, we implemented the CG and CR methods. They 
show the same performance as in matrix-vector product with the transposition 
of the matrix. 

Putting every sort of experimental facts together reveals the essence of a 
single processing node of the SR8000. Unlike on a vector processor such as the 
HITAC S-3800, the effect of different aspects such as a parallelization by a direc- 
tive, cache misses and memory access should be take into account for optimiza- 
tion, according to the structure of loops. The cache misses cause a considerable 
loss of the performance. However, unlike on the usual scalar machines, pseudo- 
vector processing facility serves to partly suppress the loss by cache misses and 
it indeed minimizes the effect of cache misses for a certain range of the problem 
size, as shown by a plateau in Fig. 5. In summary, we should remember that 
a single node of the SR8000 is a shared memory parallel computer composed of 
eight scalar (RISC) processors equipped with a pseudo-vector processing facility. 

In a future work, we shall (1) tune other BLAS routines, (2) implement the 
block algorithms for matrix-matrix product, and (3) extend the present work to 
multiple processing nodes. 
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Abstract. Long-term exposures to high ozone concentrations have 
harmful effect on the crops and reduce the yield. The exposures are 
measured in terms of AOT40 (Accumulated exposure Over Threshold of 
40 ppb). The threshold of 40 ppb has been accepted after several years 
of experimental research in open top chambers throughout Europe. As 
a result of these experiments a critical level of 3000 ppb. hours has been 
established for the crops. The sensitivity of the wheat to exposures above 
the critical level has been studied in more detail and a linear dependence 
between the relative yield and the AOT40 has been found. This relation- 
ship is used in the paper to estimate the wheat losses in Bulgaria and 
Denmark by regions in several consequtive years. 

The Danish Eulerian Model is used to calculate the AOT40 values on the 
EMEP grid (a 96x 96 square grid with step 50 km., which covers Europe). 
The results on parts of this grid (covering Bulgaria and Denmark) are 
only used. In addition regional information about the actual yield and the 
prices is also needed for the target years. The algorithm for economical 
evaluation of the losses can be applied with different scenarios for partial 
reduction of the emissions and some other key parameters. The results 
can be used as a ground for a cost-benefit analysis of possible ozone 
reduction measures when effects on vegetation are studied. 



1 Introduction 

The damaging effects of high ozone concentrations on agricultural crops is known 
for a long time, but little was known until 1990 on the extent of these dam- 
ages worldwide. Extended research on this problem has been conducted during 
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the past ten years. Newly designed open top chambers (OTC) allow systematic 
study, leading to quantifiable estimates for use in policy analysis. To report on 
early progress, a number of meetings were carried out during the 1990’s, e.g. in 
Switzerland in 1993 ([4]), and in Finland in 1995 ([6]). Among the recommenda- 
tions from these meetings, a new parameter called AOT40 was introduced ([4] 
and [6]). This parameter was suggested to be applied to agricultural and eco- 
nomic assessments and subsequent modeling of benefits associated with reduced 
ozone exposure. The AOT40 parameter is commonly accepted now, also in the 
discussions of the forthcoming EU Ozone Directive (see [1,3]). 

The value 40 ppb is a practically determined threshold, below which the losses 
of crops due to ozone exposure could be neglected, and above which the losses 
are assumed to be linear with respect to the exposure. The choice of AOT40 
is based on a large number of OTC experiments in Europe and in the United 
States. 

In this work wheat losses in Bulgaria and Denmark for a period of ten years 
are calculated by using AOT40 values produced by the Danish Eulerian Model 
(DEM) [10,11]. Our work on these problems started a year ago [2]. Since then 
the model was improved significantly in its chemical part and in the vertical ex- 
change through the boundary layers in accordance with the knowledge obtained 
by analyzing new measurement data. In addition a new release of emissions 
data from the EMEP inventories is used as an input in DEM now. All these 
developments lead to certain differences in the results, presented in this paper, 
in comparison with [2]. We believe our new estimates are much more accurate, 
although many uncertainties still remain. 



2 Data Sets Used in the Study 

The following data files are used in the calculations: 

— The AOT40 values on the EMEP grid for each year in the period 1989 - 1998 
(only the values on the parts of the grid covering Bulgaria and Denmark 
are used). These are produced by DEM and verified by comparisons with 
measurements and with results, obtained by other models. The maps with 
the AOT40 values over Bulgaria for 1990 and 1994 are shown in Fig. 1; 

— The relationship between the AOT40 and the wheat loss, based on experi- 
ments in OTC, presented at meetings in Switzerland and Finland [5,4,7]; 

— The wheat yield in Bulgaria and Denmark by regions for the years under 
consideration, taken from the corresponding Statistical Yearbooks [8,9]. 



3 Calculating the Losses by Regions 

The basic assumption behind introduction of the excess ozone concept and 
AOT40 in particular is that the relative yield of wheat linerly depends on the 
value of AOT40. Provided that y is the actual yield, y + z is the expected yield 
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OZONE EXPOSURES FOR CROPS IN 1990 
Percentages: 100*AOT40C/3000 
The numbers show the AOT40C values 
in % of the critical value 3000 ppb.hours 
(basic scenario) 





OZONE EXPOSURES FOR CROPS IN 1994 



Percentages: 100*AOT40C/3000 
The numbers show the AOT40C values 
in % of the critical value 3000 ppb.hours 
basic scenario) 
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Fig. 1. AOT40 maps of Bulgarian region for 1990 and 1994 



without any ozone exposure, ^ is the AOT40 in ppb-hours, this linear regression 
can be expressed as follows: 

I00y/{y+ z) = a^ + P (a < 0 , /3 « 100) (1) 

where a and /3 are empirically determined coefficients. The values a = —0.00151 
and P = 99.5, derived from OTC experiments performed in the Scandinavian 
countries, are used for calculating the losses in Denmark. The mean values, ob- 
tained by analyzing a wide set of OTC experiments representative for Europe, 
are a = —0.00177 and P = 99.6. The latter values are used in calculating the 
losses in Bulgaria. The specific coefficients for South-European countries (in- 
cluding Bulgaria) could be slightly different. Due to the small number of OTC 
experiments in these countries such specific coefficients cannot be determined 
yet. The above values are due to Pleijel [7]. 

Let us consider first a simplified (scalar) version of the task: to find the loss z 
of given crop yield y from a single region with a constant value ^ of AOT40. 
The linear regression (1) gives the actual yield y (in our task it is given) as a 
function of x = y + z, where z is the unknown to be calculated. 



,, _ {ai+P)iv+z) 

« 100 

100 y = (a^ + P)y + {a^ + P) z 



(100 — P — «C) y = (o^ P) z 



_ (100-/3-O 
^ - (aJ+/3) 



y = fiO y 



( 2 ) 
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Consider now the real task: to find the losses from each of the m regions 
of the country, covered by n grid cells, taking into account that each region is 
covered by several grid cells with different value of AOT40. Denote by 

S = (Cj)j=i " the AOT40 vector (m x 1), per grid cell, calculated by the DEM; 
A - the regional division matrix (m x n); 

Y - the yield matrix (m x fc), yields of k crops per regions; 

Z - loss matrix (m x fc), per regions - unknown 

The calculations can be done in the following way (not unique): 

S = AO, Z=diag{S)Y 

Applying / from (2) to S componentwise we find first the relative losses by 
grid cells 0 (with respect to the actual yields). The matrix- vector product A0 
gives the relative losses by regions (S), and multiplying with them the rows of Y 
we obtain the corresponding losses Z. 

4 Reduced Traffic Scenario 

Traffic is known to be one of the primary sources of ozone pollution in the 
developed countries. In order to evaluate the contribution of the traffic to the 
overall ozone pollution and the resulting economical losses, a scenario with 90% 
reduction of the actual traffic emissions is included in our study. This scenario is 
called hereafter traffic scenario, unlike the basic scenario, which denotes the ac- 
tual situation and the corresponding actual losses. The traffic scenario is applied 
to the same 10-year period and the corresponding AOT40 values are calculated 
by using the Danish Eulerian Model [10,11]. The flexibility of the model allows 
us to calculate these values by proper reduction of the actual emissions and 
keeping all the other input data unchanged. Simple calculations show that the 
traffic scenario leads approximately to the following global reductions of the 
anthropogenic emissions: 

— 45% reduction of the anthropogenic NOx emissions; 

— 40% reduction of the anthropogenic VOC emissions; 

— 54% reduction of the anthropogenic CO emissions; 

— no change in the anthropogenic SO 2 and NH^ emissions as well as in all 
biogenic emissions. 

The results for the estimated wheat losses in Bulgaria and Denmark, obtained 
both with the basic and the traffic scenario, are given in the next two sections. 

5 Estimated Wheat Losses in Bulgaria 

Numerical results for the estimated wheat losses due to high ozone levels in 
Bulgaria during a ten-year period (1989 - 1998) are presented in this section. 
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Table 1. Ozone-caused wheat losses in Bulgaria for 1990 



Wheat yield in 1990 


Losses of wheat (in thousand tons and %) 


Region 


Yield 


Basic 


scenario 


Traffic 


scenario 


Savings 


1 Sofia City 


34.5 


12.7 


26.9% 


8.6 


19.9% 


4.1 


2 Burgas 


671.1 


231.7 


25.7% 


161.6 


19.4% 


70.1 


3 Varna 


1257.2 


430.4 


25.5% 


300.7 


19.3% 


129.7 


4 Lovech 


802.4 


285.0 


26.2% 


199.2 


19.9% 


85.8 


5 Montana 


667.9 


276.9 


29.3% 


199.6 


23.0% 


77.3 


6 Plovdiv 


277.7 


105.6 


27.5% 


75.8 


21.4% 


29.8 


7 Russe 


887.7 


350.0 


28.3% 


253.5 


22.2% 


96.5 


8 Sofia 


209.8 


79.0 


27.4% 


56.5 


21.2% 


22.5 


9 Haskovo 


483.8 


173.6 


26.4% 


125.1 


20.5% 


48.5 


Whole country 


5292. 


1945.0 


26.9% 


1380.6 


20.7% 


564.4 


Table 2. 


Ozone-caused wheat losses in Bulgaria for 1994 


Wheat yield in 1994 


Losses of wheat (in thousand tons and %) 


Region 


Yield 


Basic 


scenario 


Traffic 


scenario 


Savings 


1 Sofia City 


26.9 


7.5 


21.8% 


5.0 


15.5% 


2.5 


2 Burgas 


442.3 


105.9 


19.3% 


74.4 


14.4% 


31.5 


3 Varna 


956.4 


251.5 


20.8% 


176.5 


15.6% 


75.0 


4 Lovech 


485.1 


128.5 


20.9% 


90.7 


15.7% 


37.8 


5 Montana 


469.6 


133.1 


22.1% 


99.3 


17.5% 


33.8 


6 Plovdiv 


209.6 


58.9 


21.9% 


43.0 


17.0% 


15.9 


7 Russe 


664.0 


189.2 


22.2% 


137.0 


17.1% 


52.2 


8 Sofia 


143.4 


40.4 


22.0% 


29.4 


17.0% 


11.0 


9 Haskovo 


357.0 


95.7 


21.1% 


71.4 


16.7% 


24.3 


Whole country 


3754. 


1011.0 


21.2% 


726.7 


16.2% 


284.3 


Table 3. 


Ozone-caused wheat losses in Bulgaria for 1998 


Wheat yield in 1998 


Losses of wheat (in thousand tons and %) 


Region 


Yield 


Basic 


scenario 


Traffic 


scenario 


Savings 


1 Sofia City 


32.4 


10.4 


24.3% 


5.8 


15.2% 


4.6 


2 Burgas 


452.4 


122.1 


21.3% 


68.0 


13.1% 


54.1 


3 Varna 


733.7 


219.7 


23.0% 


121.0 


14.2% 


98.7 


4 Lovech 


455.4 


151.0 


24.9% 


87.4 


16.1% 


63.6 


5 Montana 


326.2 


97.8 


23.1% 


61.3 


15.8% 


36.5 


6 Plovdiv 


217.8 


71.4 


24.7% 


40.3 


15.6% 


31.1 


7 Russe 


547.7 


182.6 


25.0% 


111.6 


16.9% 


71.0 


8 Sofia 


132.0 


42.5 


24.4% 


24.3 


15.6% 


18.2 


9 Haskovo 


315.2 


96.9 


23.5% 


57.3 


15.4% 


39.6 


Whole country 


3213. 


994.0 


23.6% 


577.0 


15.2% 


417.0 



The results for 1990, 1994, 1998 are presented in more detail in Tables 1, 2 and 3 
In Table 4 the mean values for the ten- year period under consideration are given 
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Table 4. Average ozone-caused wheat losses in Bulgaria for the period 1989 - 
1998 



Average yield (1989—98) 


Average losses (in thousand tons and %) 


Region 


Yield 


Basic scenario 


Traffic 


scenario 


Savings 


1 Sofia City 


21.1 


7.4 


26.0% 


5.1 


19.6% 


2.3 


2 Burgas 


485.4 


131.0 


21.3% 


96.4 


16.6% 


34.6 


3 Varna 


871.4 


253.5 


22.5% 


186.5 


17.6% 


67.0 


4 Lovech 


465.6 


141.4 


23.3% 


103.7 


18.2% 


37.7 


5 Montana 


391.5 


136.2 


25.8% 


103.7 


20.9% 


32.5 


6 Plovdiv 


237.9 


70.8 


22.9% 


53.2 


18.3% 


17.6 


7 Russe 


593.1 


199.3 


25.2% 


150.0 


20.2% 


49.3 


8 Sofia 


141.1 


42.6 


23.2% 


31.8 


18.4% 


10.8 


9 Haskovo 


356.1 


107.9 


23.3% 


81.6 


18.7% 


26.3 


Whole country 


3563. 


1090.0 


23.4% 


812.2 


18.6% 


277.8 



The yield of wheat (in thousand tons) in the Bulgarian regions as well as in the 
whole country is given in column 2 of these tables. The estimated wheat losses 
(in thousand tons and %) are given in the third column. The virtual losses in 
case that the traffic emissions in Europe are reduced by 90 % are given in the 
next column, and the corresponding savings (in thousand tons) are given in the 
last column of the tables. 

6 Estimated Wheat Losses in Denmark 

The wheat losses in Denmark (by regions as well as for the whole country) have 
also been studied. Instead of tables the losses (in %) for 1990, 1994, 1998 and the 
average values for the period 1989 - 1998 are presented as plots in Fig. 2. The 
percentages of losses in Denmark are about twice as small as those in Bulgaria, 
because of the lower AOT40 values for Denmark. The losses in Denmark seem 
to be more sensitive to reduction of the traffic emissions, as seen from the results 
for the Traffic scenario (the right-hand side plots in the figure). 

7 Concluding Remarks 

The results reported in this paper indicate that the current levels of AOT40 are 
causing rather big losses of the wheat yield, especially in Bulgaria. The study 
has been carried out over a time interval of ten years (1989-1998). The amount 
of losses varies considerably from one year to another. The variations are caused 
both by the fact that the meteorological conditions are changing from one year 
to another and by the fact that the European anthropogenic emissions were 
gradually reduced in the studied ten-year period. 

The effect of reduction of the traffic emissions is stronger in Denmark, com- 
pared to Bulgaria. This can be explained with the higher gradient of the AOT40 
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Fig. 2. Wheat losses in the Danish regions under the Basic scenario (left plots) 
and the Traffic scenario (right plots) for 1990, 1994, 1998, and the average losses 
for the period 1989-98 
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values over Denmark due to its location between the zone of high ozone levels 
around German-Polish border and the zone of clean air over Central Scandinavia. 

This work can be generalized in at least two directions. The method and the 
algorithms described can easily be adjusted to cover other countries or groups 
of countries, even whole Europe. They can also be used to calculate the total 
agricultural losses from the ozone exposure, taking into account the various sen- 
sitivity of different types of crops. The main obstacles are obtaining the necessary 
input data as well as the lack of experimental study on the sensitivity of a wider 
variety of crops to ozone exposure. 



Acknowledgments 

This research was partially supported by NATO under projects ENVIR.CGR 
930449 and OUTS.CGR.960312, by the EU ESPRIT projects WEPTEL 
(#22727) and EUROAIR (#24618), and by the Ministry of Education and Sci- 
ence of Bulgaria under grants 1-811/98 and 1-901/99. It is also partly supported 
by a grant from the Nordic Gouncil of Ministers. A grant from the Danish Nat- 
ural Sciences Research Gouncil gave us access to all Danish supercomputers. 



References 

1. M. Amann, I. Bertok, J. Cofala, F. Gyartis, C. Heyes, Z. Kilmont, M. Makowski, 
W. Schop and S. Syri. Cost-effective eontrol of acidifieation and ground-level ozone. 
Seventh Interim Report of IIASA, A-2361 Laxenburg, Austria, 1999. 637 

2. I. Dimov, Tz. Ostromsky, I. Tzvetanov, Z. Zlatev. Economical Estimation of the 
Losses of Crops Due to High Ozone Levels. In Notes on Numerical Fluid Mechanics, 
Vol.73, 2000, pp. 275-282. 637 

3. Position paper for ozone. European Commission, Directorate XI: ’’Environment, 
Nuclear Safety and Civil Protection”, Brussels, 1999. 637 

4. J. Fuhrer and B. Achermann (eds.). Critical levels for ozone. Proc. UN-ECE Work- 
shop on Critical Levels for Ozone, Swiss Federal Research Station for Agricultural 
Chemistry and Environmental Higyene, Liebefeld-Bern, Switzerland, 1994. 637 

5. J. Fuhrer, L. Skarby and M. R. Ashmore. Critical levels for ozone effects on vege- 
tation in Europe. Environmental Pollution, Vol.97, 1-2, 1997, pp. 91-106. 637 

6. L.Karenlampi and L. Skarby (eds.). Critical Levels for Ozone in Europe: Test- 
ing and Einalizing the Concepts. Proc. UN-ECE Workshop on Critical Levels for 
Ozone, University of Kuopio, Finland, 1996. 637 

7. H. Pleijel. Statistical aspects of critical levels for ozone based on yield reductions 
in crops. In ’’Critical Levels for Ozone in Europe: Testing and Finalizing the Con- 
cepts” (L.Karenlampi and L. Skarby, eds.). University of Kuopio, Finland, 1996, 
pp. 138-150. 637, 638 

8. Statistical Yearbook of Bulgaria, Vol. 90, . . . 99, Statistical Institute - BAS, Sofia 
637 

9. Statistisk Arbog - Danmark, Vol. 90, . . . 99, Danmarks Statistic, Copenhagen. 637 

10. Z. Zlatev, J. Christensen and 0. Hov, An Eulerian model for Europe with nonlinear 
chemistry, J. Atmos. Chem., 15, 1992, pp. 1-37. 637, 639 

11. Z. Zlatev, I. Dimov and K. Georgiev, Studying long-range transport of air pollu- 
tants, Computational Sci. & Eng., 1, 1994, pp. 45-52. 637, 639 



A Homotopic Residual Correction Process* 



V. Y. Pan 

Mathematics and Computer Science Department, Lehman College, CUNY 
Bronx, NY 10468, USA 
vpanSlehman . cuny . edu 



Abstract. We present a homotopic residual correction algorithm for the 
computation of the inverses and generalized inverses of structured matri- 
ces. The algorithm simplifies the process proposed in [P92], and so does 
our analysis of its convergence rate, compared to [P92]. The algorithm 
promises to be practically useful. 
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1 Introduction 

Residual corection processes for matrix inversion (in particular Newton’s itera- 
tion) have been known for long time [S33, IK66] but remained unpopular because 
they involved expensive operations of matrix multiplication in each step. (How- 
ever, they can be effectively implemented on parallel computers, and they have 
advantage of converging to the Moore-Penrose generalized inverse where the in- 
put matrix is singular.) It was recognized later on [P92, P93, P93a, PZHD97], 
[PBRZ99, PROO, PRWOO, PRW,a, BM,a, POO] that such processes are highly 
effective in the case of structured input matrices M because multiplication of 
structured matrices is inexpensive. 

It was required, however, to modify the processes in order to preserve the 
structure, which without special care deteriorates rapidly in the process of the 
computation. In particular, the displacement rank of a Toeplitz-like matrix can 
be tripled in each step of Newton’s iteration. To counter the problem, it was 
proposed in [P92, P93, P93a] (cf. also [PBRZ99] and the extensions to non- 
Toeplitz-like structures in [PZHD97, PROO, PRWOO, PRW,a], and [POO]) to re- 
cover the structure by periodically zeroing a few smallest singular values of the 
displacement matrices of the computed approximations to M~^, that is, to rely 
on the numerical (SVD based) displacement rank. 

It was proved in [P92] and [P93] that the truncation of the s smallest singular 
values of the displacement matrix associated with a computed approximation 
to M~^ increased the residual norm by at most the factor of sn for an n x n 

* Supported by NSF Grant CCR9732206 and PSC CUNY Award 61393-0030. 
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matrix M . Such an estimated growth was immaterial where the residual norm 
was small enough because it was at least squared in each iteration step, but the 
convergence can be destroyed by the truncation of the singular values unless a 
sufficiently close initial approximation to is available. 

The latter requirement, however, can be partly relaxed based on the homo- 
topy technique proposed in [P92]. The idea is to start with an easily invertible 
matrix (say, with the identity matrix Mq = I) and to invert recursively the 
matrices Mi = tiM + {1 — ti) I where ti were monotone increasing from 0 to 1 
and where ti — ti-i were sufficiently small, so that served as a close initial 
approximation to 

In [P92] a variant of this approach was specified and analyzed for the Toe- 
plitz-like input though in a way directed towards asymptotic computational com- 
plexity estimates rather than effective practical implementation. 

In the present paper, we simplify the latter variant, make it more convenient 
for the implementation, and generalize the choice of the intial approximation to 
cover non-Toeplitz-like structures as well. 

Our attention is to the design and the study of a homotopic process, where 
the residual correction algorithm is used as a black box subroutine (based, 
e.g., on the algorithms of [PBRZ99] in the Toeplitz or Toeplitz-like cases), 
for which we just supply the required input matrices Mi and M~}-y and ob- 
tain the output matrix M~^. We describe this process for a general real sym- 
metric (or Hermitian) positive definite matrix M, though the promising ap- 
plications are where M is a, Toeplitz, Toeplitz-like, or another structured ma- 
trix, because in the latter case the residual correction process is most effec- 
tive [P92, P93, P93a, PBRZ99, PROO, PRWOO, PRW,a, BM,a, POO]. 

The required positive bound 9 on the initial residual norm is in our hands - 
we may choose it as small as we like, but the number of the required homotopic 
steps is roughly proportional to 1/9 (see section 4). 

Our analysis shows the efficacy of the proposed approach. 

We organize the presentation as follows. In section 2, we briefly recall the 
residual correction process. In section 3, we describe our basic homotopic ap- 
proach. In section 4, we estimate the number of its required homotopic steps 
and comment on the choice of 9. In section 5, we generalize its initial step and 
analyze the generalized process. 



2 Residual Correction Processes 

A crude initial approximation to the inverse of a real symmetric (or Hermitian) 
positive definite matrix M can be rapidly improved by the method of residual 
correction: 

p-i 

* = 0 , 1 ,..., ( 1 ) 

k=0 



where we write 



R, = R(M,W) 



I -MX,. 



( 2 ) 
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(1) and (2) imply that 



Rh = {Roy\ h = l,2,..., (3) 

which shows the order of p convergence of process (1) to the matrix M~^ pro- 
vided that 

p(i?o) = PolU < 0 < 1, Ro = R{M,Xo), (4) 

for a fixed real 9. To optimize both the computational work per step (1) and 
the number of steps required to ensure the desired upper bound on p{Rh), one 
should choose p = 3 [IK66], pages 86-88, but even better results can be reached 
by using scaled process (1) for p = 2, which is actually Newton’s scaled process: 

Xi+i = Ci+iXi{I + Ri), (5) 

for appropriate scalars c^+i [PS91]. We will apply algorithms (1) or (5) as black 
box subroutines in a homotopic process, which starts with the trivial inversion 
of the identity matrix I, ends in computing and fulfils (4) at every in- 

termediate step. We recall that the residual correction algorithms are strongly 
stable numerically (even when the input matrix M is singular [PS91]) and our 
process inherits this property. 



3 A Homotopic Residual Correction Process 

Let spectrum(M) = {Ai, . . . , A„}, where 

A^ > Ai > A 2 > • • • > A„ > A" > 0 (6) 

and where A^*" and A“ are known values. Let us write Mq = M + IqI, to = A^*" /9, 
0 <9 <1. Then 

R{Mo,t-^I) = I-t-^Mo = t-^M, p{R{Mo,t^^I)) = ||i?(Mo, < 9, 
and Mo is inverted rapidly by processes (1), (5). Let us further write 

Mh+i = th+il + M = Mh — Ahl, Ak = tfi — th+i >0, h = 0, 1, — (7) 



Then we have 



R{Mh+i,M~^) = AhM~\ 

Th+i = ||-R(M/i+i,M^^)||2 < Ah\\XI^'^\\2 < Ah/{th + A“). 



We choose 

Ah = {th + \-)9, h=l,2,...,i7-l, (8) 

which implies that Vh+i < 9 for all h, and we recursively invert the matri- 
ces Mh+i by applying process (1), (5) as long as th+i remains positive. As soon 
as we arrive at < 0, we invert M instead of Mh- 
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Remark 1. The requirement of nonsingularity of M can be relaxed if we simply 
replace (6) by the requirement that 

> Ai > • • • > > A“ > 0, Ai = 0 for i > r- = rank M. 

The only resulting changes in the homotopic process is that its convergence will 
be to the Moore-Penrose generalized inverse of M, and A“ will replace X~ in (8) 
and in our subsequent estimates for the number of homotopic steps. 

Remark 2. The approach allows variations. For instance, instead of (7), we may 
apply the following dual process: 

Mh+i = I + tfi+iM = Mh + {th+i — th)M, = 0, 1, . . . , 

followed at the end by a single step (7) or a few steps (7). The resulting compu- 
tations can be analyzed similarly to (7). 



4 Estimating the Number of Homotopic Steps 

By (7) and (8) we have, th+i = (1 — 0)th — S \~ , h = 1, . . . — 1. Therefore, 

= (1 — d)to — 0A„ , 

t2 = (1 - e)h - o\- = (1 - efh - ((1 -e) + 

h = {i- 0)t2 - e\- = (1 - efh - ((1 - ef + (i - 0 ) + 



and recursively, we obtain that 



^-1 

th = (i- 9)% - 5^(1 - eyex- = (i - e)% - (i - (i - 0 )"‘)a-, h = i, 2 ,.... 

i^O 

We have < 0 if (1 — 0)^tQ < (1 — (1 — 0)^)X~. Substitute to = /0 and 

rewrite the latter bound as follows: 



At/(0A-) < 



(1-0) 



H 



- 1 , 



> A^/(0A-) + l, 



( 1 - 0 )^ 

iJ>-log(l + A+/(0A-))/log(l-0). 



We choose the minimum integer H satisfying this bound, that is, 

H = riog(l + A+/(0A-))/ log(l/(l - 0))1 . 

The scaled Newton’s iteration of [PS91] yields the bound of roughly log(A)^ /X~) 
on the overall number of steps (5) for p = 2, which is superior to the above 
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bound on H because each homotopic step generally requires a few steps (5). 
Our homotopic process, however, has an important advantage in application 
to structured matrices M, as we explained in the introduction. By the latter 
estimate, we should choose a larger 9 to decrease the number of the homotopic 
steps, H, but in applications to the inversion of structured matrices we should 
keep 9 small enough to preserve the convergence under the truncation of the 
smallest singular values. In the Toeplitz-like case, recall that such a truncation 
increases the residual norm by at most the factor of sn (see [P93] and our 
section 1) and conclude that the choice of the value 9 = 0.5/(sn)^ is clearly 
sufficient. The bound sn, however, is overly pessimistic according to the extensive 
numerical experiments reported in [BM,a]. Thus the choice of much larger values 
of 9 should be sufficient, and heuristics seem to be most appropriate here. 

5 Unified Initialization Rule for the Homotopic Process 

The initial choice of Mq = M + toI preserves the Toeplitz-like structure of M but 
may destroy some other matrix structures such as Cauchy-like or Vandermonde- 
like ones. This choice, however, can be generalized as follows: 

Choose a real symmetric (or Hermitian) positive definite and well conditioned 
matrix Mg. Let it also be readily invertible and let it share its structure with the 
input matrix M, so that the matrices tMo -|- M are structured for any scalar t. 
Recursively define the matrices 

Mh+i = th+iMo + M = Mh + {th+i — th)Mo, h = 0,1, . . . , H — 1, 

where ti = 1 > t 2 > ■ ■ ■ > tn-i > tn = 0, and write tg = 0- 

Now, let spectrum(Mo) = {/ii, . . . , /x„}, where fJ.f > fJ .2 ^ ^ fJ-n ^ 

> 0, and and /i“ are available. Let us write recall that 

||M ^^||2 < ^l(thfJ-h +^n) for all h (cf., e.g., [ParSO], p.l91), HM 0 H 2 = fn < 
and deduce that 

||7- (tiMo)-'Mi||2 < \\PI^^M/h\\^ < \\Mo%\\M\yti < A+/(ti/i-). 
Now, we choose 

h = Xt/{y9), (9) 

so that ||7 — M 1 W 2 — and we invert Mi by applying processes (1) or 

(5) for Xq = fiMg. 

Next we deduce that 

I — ^Mh+i = (th — t?i+i)M^ ^Mq, 
||/-M^-'M^+i||2<(t/.-t„+i)||M^-i|y|Mo||2. 

Substitute 



ll-Lf;, < 1/ithfJ-n + A„), 
\\Mo\\^<y, 



A Homotopic Residual Correction Process 649 



and obtain that \\I - ^Mh+i\\^ < 6 if {th - th+i)fJ.t /{thfJ-n + A„) < 6> or, 

equivalently, if th+i > th{l - 0 /k+) - 9X~/nf. 

Thus, we choose 



th+i=th{l-O/K+)-0\-/fj,+ . (10) 

and invert Mh+i by applying processes (1) or (5) for Xq = Mh and for h = 
1, 2, . . . , iJ — 2, until th+i of (10) becomes nonpositive for h = H —1. By (9) and 
(10), this must occur for iJ < 1 + [(log ti)/log(l — 6/k'^)'\ and ti of (9). 
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Abstract. Monte Garlo (MG) methods have proved to be flexible, ro- 
bust and very useful techniques in computational finance. Several stud- 
ies have investigated ways to achieve greater efficiency of such methods 
for serial computers. In this paper, we concentrate on the paralleliza- 
tion potentials of the MC methods. While MC is generally thought to 
be “embarrassingly parallel” , the results eventually depend on the qual- 
ity of the underlying parallel pseudo-random number generators. There 
are several methods for obtaining pseudo-random numbers on a parallel 
computer and we briefly present some alternatives. Then, we turn to an 
application of security pricing where we empirically investigate the pros 
and cons of the different generators. This also allows us to assess the 
potentials of parallel MC in the computational finance framework. 



1 Introduction 

The Monte Carlo (MC) method [18,10] is widely applied to large and complex 
problems to obtain approximate solutions. This method has been successfully 
applied to problems in physical sciences and, more recently, in finance. Many 
difficult financial engineering problems such as the valuation of multidimensional 
options, path-dependent options, stochastic volatility or interest rate options can 
be tackled thanks to this technique. 

An option (also called derivative security) is a security the payoff of which 
depends on one or several other underlying securities. The prices of these un- 
derlying securities are often modeled as continuous-time stochastic processes. 
Assuming that no arbitrage exists, one can show that the price of such an option 
is the discounted expected value of the payoffs under the risk neutral measure, 
see e.g. [5]. In such a framework, pricing an option that can be written as an 
expectation of a random variable lends itself naturally to a numerical procedure 
that estimates this expected value through simulation. 

Generally, the MC procedure involves generating a large number of realiza- 
tions of the underlying process and, using the law of large numbers, estimating 
the expected value as the mean of the sample. In our framework this translates 
into Algorithm 1. 

We note that the standard deviation of the MC estimation C decreases at 
the order 0{1/'/N) and thus that a reduction of a factor 10 requires an increase 
of the number of simulation runs N of 100 times. 
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Algorithm 1 Monte Carlo 



1 : 

2 : 



for j=l to N do 

Simulate sample paths of the underlying variables (asset prices, 
interest rates, etc.) using the risk neutral measure over the 
time frame of the option. For each simulated path, evaluate the 
discounted cash flows of the derivative Cj 
end for 



Average the discounted cash flows over the sample paths 



C 



_ 1 _ 

N 






5: Compute the standard deviation 



\ (N 



1 ^ 

^ i=l 



The major advantage of the MC approach is that it easily accommodates 
options with complex payoff functions. Asian options and lookback options are 
two typical examples of path dependent options. The Asian option depends on 
the average value of the underlying asset price and the lookback option depends 
on the maximum or minimum of the underlying asset. In such cases, analytic 
formulas do not always exist and are difficult to construct or to approximate. 
However, it is straightforward to adapt the MC procedure to price these options 
by changing the payoff function. 

Monte Carlo can also be helpful when considering the valuation of multi- 
asset options, i.e. options depending on several underlying securities, such as 
for instance index options, basket options or options on the extremum of sev- 
eral assets. As mentioned earlier, the price of the option can be expressed as an 
expectation, which is in this case a multidimensional integral. The higher dimen- 
sion of the problem very quickly becomes a limiting factor with other methods, 
since the complexity of the computations generally grows exponentially with the 
dimension. In this case too, MC is essentially the method of choice, since its 
complexity does not depend on the dimension of the problem.^ 

Since the principal drawback of the MC method is its slow convergence, 
different strategies have been devised to speed up the process. Variance reduc- 
tion techniques, such as antithetic variates, control variables, stratified and im- 
portance sampling, can be applied. More recently, the use of low discrepancy 
sequences has also helped in certain cases. Several papers have described and 
analyzed the use of Monte Carlo techniques in finance [1,11,3]. Improvements in 
the efficiency using variance reduction techniques is thoroughly discussed in [2] . 

The MC method relies on the use of a pseudo-random number generator 
(RNG) to produce its results. The generation of random numbers is known to be 
difficult since deterministic algorithms are used to obtain “random” quantities. 
Bad RNGs are detrimental to MC simulations. The sequence of an ideal RNG 
should: 

^ The only assumption is that the function should be square integrable, which usually 
is not a very stringent condition. 
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— be uniformly distributed, 

— not be correlated, 

— pass statistical tests for randomness, 

— have a long period, 

— be reproducible, 

— be fast, 

— be portable, 

— require limited memory. 

It is difficult for an RNG to satisfy all these qualities simultaneously. Two 
main approaches are used to assess the quality of an RNG: a theoretical study of 
the properties of the sequence and a statistical test of the sequence. Yet, certain 
generators that perform well in such studies may prove unreliable in certain MG 
applications, see [9,20]. 

The generation of random numbers on parallel computers is usually worse 
than in the serial case. The streams of numbers produced by different processors 
could be correlated, which is referred to as inter-processor correlation. Such a 
situation would not appear in the serial case. 

In this paper, we will present some of the issues related to the generation of 
random numbers on parallel machines. In a second part, a parallel application 
of MG to pricing derivative securities will show some of the problems one may 
encounter. This complements the more general analysis that can be found on 
parallel RNGs, since even generators that perform well in the standard tests 
may prove unreliable in certain applications, particularly in MG simulations. 

2 Serial Random Number Generators 

Generating random numbers using computers is a difficult topic, but many stud- 
ies that help better understand the issues involved have been carried out [13,14]. 
The most commonly used RNG is the linear congruential generator (LGG). It 
is based on the recurrence yn = (at/„_i -I- c) mod m where m > 0 is the mod- 
ulus, a > 0 the multiplier and c the additive constant. It is usually denoted 
LGG(m, a, c, j/o) where j/o is the seed. This produces integers in the range (0, m) 
and to obtain random numbers uniformly distributed in the interval (0, 1), one 
usually divides these integers by m,i.e. = ynjm. 

These numbers also cycle after at most m steps. When the parameter 6=0, 
this RNG is sometimes called a multiplicative linear congruential generator and 
is denoted by MLGG. For appropriately chosen parameters, these RNGs produce 
a sequence of numbers of maximal period, see [13,19]. There is no unique and 
undisputed choice of the parameters that guarantees a sequence with maximal 
period and has good theoretical and statistical properties. For 32-bit machines 
the choice LGG(2^^ — 1, 16807, 0, 1) proposed in [19], also known as MINSTD for 
minimal standard, is a popular one. 

The main drawback of LGGs is that the numbers produced have a lattice 
structure that affects MG simulations [13]. The d-tuples {xt, . . . ,Xi+d-i) be on 
parallel hyperplanes of the unit hypercube. Since the gaps between the planes 
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are never sampled, the numbers produced can be correlated; furthermore, the 
higher the dimension d, the worse the problem becomes. This bad behavior can 
be detected through the spectral test and the LCG parameters should be such 
that the distance between the planes is minimized [15]. 

Many LCGs use a modulus which is a power of 2, since this allows easier pro- 
gramming and faster execution of binary computers. For instance, the generator 
employed by the ANSI G language BSD version called by the drand48 function, 
has a modulus of 2"^®. This generator is actually exactly described by the pa- 
rameterization LGG(2^®, 25214903917, 11, 0). Power of 2 moduli have deficiencies 
since they produce random numbers that have highly correlated low order bits 
and that can show long-range correlations [6]. LGGs with prime modulus have 
better randomness properties, but they are more difficult to implement. 

Another type of linear generator of interest is the multiple-recursive generator 
(MRG) proposed by L’Ecuyer. It generalizes the MLGG generator by adding k 
terms in the recurrence = (aiyn-i a 2 j/n -2 -I- • • • -I- akUn-k) mod m. The 
coefficients are integers in the range [— (m — 1), (m— 1)]. The period and 

randomness are generally much improved compared with an MRG at the cost of 
an increase of computation time. 

It is possible to combine such generators to produce sequences that are equiv- 
alent to an MRG with very large moduli and therefore very large periods. Details 
and floating point implementations in G for 32-bit about these generators can 
be found in [15]. The specific generator MRG32k3ahas period length 2^®^ « 10^^, 
whereas MRG32k5a has period length 2®^® w 10®®. 

3 Parallel Random Number Generators 

As mentioned earlier, a parallel random number generator (PRNG) should have 
extra qualities. The PRNG should also: 

— have the same qualities as serial RNG on one processor, 

— show no inter-processor correlation of the streams, 

— generate the same stream of numbers for a different number of processors, 

— work for any number of processors, 

— keep the communication between processors to the minimum. 

The generation of random numbers on a parallel computer can be based upon 
a serial RNG by distributing the numbers produced among the processors. A 
more modern approach is to parameterize the RNG differently on each processor 
so that different streams of numbers are generated, see [7] for a survey. 

3.1 Leapfrog 

The leapfrog method distributes the numbers of a serial RNG in a cyclic fash- 
ion to each processor, like a deck of cards dealt to players. If we denote by 
(xi)i=o,i, 2 ,... the original sequence and L the lag, then the subsequence proces- 
sor p gets is 



Xi = XiL+p with p = 0, 1,2, . . . ,P < L - 1 . 
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If the original sequence is 

^0 ; ; • ■ ■ : ^L— 1 ; ; ^L + 1 ; ■ • ■ ; ^2L— 1 ; ^2L : ^2L + 1 ; ■ • ■ 

then the subsequence obtained by processor 0 is 



Xq 


, X\ , . . . , XIj — \ , 


XL 


1 + ; ^2L — 15 


X2L 



A first problem is that long-range correlations embedded in the RNG can 
become short-range correlations in the new sequence and destroy the quality of 
the PRNG see [8]. 

Secondly, such a scheme is not scalable since when the total number of pro- 
cessors P increases, the length of the sequence (ii)i=o,i, 2 ... decreases. 

For this method, we need to easily jump ahead L steps to get the next random 
number. This can be carried out with an MLGG since we have 

Vn = OLVn-i mod m = (a" mod m)yo mod m 
UiL+p = mod mYnp mod m . 

This shows that the sequence used in the processors has now multiplier 
instead of a and we cannot ensure that this will be a multiplier with good 
properties for all values of L. Jumping forward in the sequence can also be done 
for LGGs. 



3.2 Sequence Splitting 

In this case, the original sequence is split into blocks and distributed to each pro- 
cessor. Let us denote the period of the generator by p, the number of processors 
by P and the block length by L = [p/P\, we have 

Xi — XpL-i-i p — 0, 1, 2, . . . , R . 

Then the original sequence 



^0 ; ; • ■ ■ : Xjj— 1 , Xlj ^ . . . , X2L— 1 ; ^2L ; ^2L-t-l ; ■ • ■ 



is distributed as follows to processors 0,1,2,. . . 



^1; • ■ • ; ^L — 1 



^ Lt ^L-\-l ^ ; ^2L — 1 



X2L, X2L+1^ • ■ • , X^L-1 



For this method, long-range correlations can be emphasized and become 
inter-processor correlations. We know that the sequences produced will not over- 
lap, but cannot be sure that they will not show some correlation. This may again 
adversely affect the MG simulations see [4,8]. Scalability is an issue once again, 
as in the previous case. 

We need to be able to jump ahead by P steps to get to the new starting 
point for each processor. This can be done with an MLGG by using a different 
seed for each processor (see [16]) 



Vn = ayn-i mod m 
ypL = mod m)yo mod m 
ypL+i = (a* mod m)ypL mod m . 
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3.3 Parameterization 

A more recent approach to generate parallel streams of random numbers is based 
on a parameterization of each stream. This can be done in two ways: in certain 
generators, the seed value provides a natural way of dividing the sequence of an 
RNG into independent cycles; the function that outputs the next value in the 
sequence can be parameterized to give a different stream for a different value. 

These ideas are developed in e.g. [17] and implemented in the free package 
SPRNG available at <http://daniel.scri.fsu.edu/RNG/>. This library of pro- 
grams contains several RNGs that can be used in parallel and are scalable. The 
different generators are the following: 

— Modified additive lagged Fibonacci generator, 

— Multiplicative lagged Fibonacci generator, 

— 48-bit linear congruential generator with prime addend, 

— 64-bit linear congruential generator with prime addend, 

— Gombined multiple recursive generator, 

— Prime modulus linear congruential generator (requires special multi-preci- 
sion library). 

The authors of SPRNG provide a large number of tests and sound theoretical 
background for this package. 

4 Parallel Monte Carlo Option Pricing 

The prices of derivative securities, such as options, are often found analytically by 
imposing simplifying assumptions. More recently the advent of powerful numer- 
ical procedures and computers has made possible the princing of more complex 
and more realistic derivatives. 

As explained in the introduction, the representation of an option price as an 
expectation naturally provides a way to evaluate the price via MG simulation. 
The dimension of the integral depends on the number of underlyings and can 
become large. 

The goal of parallel programming is generally to speed up the computation. 
Two important concepts are the speed-up defined as Sp = Ti/Tp where T\ is the 
serial execution time and Tp is the parallel execution time using p processors; 
the efficiency Ep = Sp/p is the proportion of the time devoted to performing 
useful computational work and ranges from 0 to 1 . In the best case, the speed up 
is linear in the number of processors used and the efficiency stays constant and 
close to 1. The problem is said to be scalable if the efficiency can be kept constant 
when increasing the problem size together with the number of processors. 

In an MG simulation, no communication takes place if the RNG is well de- 
signed. Therefore the algorithm scales perfectly and adding a processor will 
generally decrease the computation time. However, if correlations appear in the 
computations of the random numbers, the results may be biased, see [8]. 



656 Giorgio Pauletto 



4.1 Description of the Option 

The problem we investigate is the pricing of multi-asset options, i.e. options 
depending on several underlying assets. We will in particular consider the pricing 
of a European call option on the maximum of n risky assets. Even though closed 
form solutions exist [12], the computations quickly become burdensome when 
the dimension increases. 

The underlying assets have prices S\{t),S 2 {t), . . . ,Sn{t) at time t = 0, . . . , T 
and the respective strike prices are Ki,K 2 , ■ ■ ■ ,Kn- We also assume the usual 
lognormal diffusion process 

— = ^J^i + aidZi z = 1, 2, . . . , n , 

where ^.i and (Ji denote respectively the expected rate of return and volatility 
and dZi is the Wiener process for asset i. These processes can be correlated and 
Pij denotes the correlation coefficient between dZi and dZj. 

The price of the call at maturity time T is 

C{T) = max {max (5i(T) - S 2 {T) - K 2 , . . . , Sn{T) - K^) , 0} 

and what we look for is the value of this option at time 0, (7(0). 

The steps for pricing such options with MC are described in Algorithm 2. 



Algorithm 2 Monte Carlo pricing of a multi-asset option 



Decompose the correlation matrix with Cholesky S = LL' 
for j=l to N do 

Generate a n dimensional vector of unit normal random values z 
Transform z = Lz 

Compute the discounted cash flows of the derivative Cj 
end for 



1 ^ 

7: Average the discounted cash flows over thesample paths C — \ C- 






8: Compute the standard deviation = 



1 " 



As one can see the parallel part in this computation is only offered in the main 
MC loop since the random variables have to be combined into a multivariate 
normal vector. This is in contrast with other studies such as [8] that distribute 
the computation along the dimension n. 

We nonetheless expect to show that long-range correlations among multiple 
parallel streams from LCGs produce spurious results when using consecutive 
blocks. The use of the SPRNG package should resolve the problem since one can 
generate many non correlated streams on different processors. The package im- 
plements the algorithms so that they are scalable which should also remove a 
second drawback of the splitting or blocking schemes. 
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Abstract. The main results of a componentwise error analysis for a 
parallel partitioning algorithm [7] in the case of banded linear systems 
are presented. It is shown that for some special classes of matrices, i.e. 
diagonally dominant (row or column), symmetric positive definite, and 
M-matrices, the algorithm is numerically stable. In the case when the 
matrix of the system does not belong to the considered classes is pre- 
sented a stabilized version of the algorithm. 



1 Introduction 

A well-known algorithm for solving tridiagonal systems in parallel is the method 
of Wang [7] . Full roundoff error analysis of this algorithm can be found in [8] . 

A generalized version of this parallel partitioning algorithm later has been 
applied from the other authors [2,5]. Backward componentwise error analysis of 
this generalized version can be found in [9]. In this work are obtained bound 
on the equivalent perturbations depending on three constants and then are pre- 
sented bound on the forward error as well depending on two types of condition 
numbers. 

In the present work we consider more precisely the case when matrix of the 
system belongs to one of the following classes: diagonally dominant, symmetric 
positive definite, or M-matrices. 

First, we present a brief description of the algorithm (for banded systems 
only). Let the linear system under consideration be denoted by 

Ax = d, (1) 

where A G 7?."^", which bandwidth is 2j + I- For simplicity we assume that 
n = ks — j for some integer fc, if s is the number of the parallel processors we 
want to use. We partition matrix A and the right hand side d of system (1) as 



* This work was supported by Grants MM-707/97 and 1-702/97 from the National 
Scientific Research Fund of the Bulgarian Ministry of Education and Science. 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 658—665, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 



Stability of a Parallel Partitioning Algorithm 659 



follows: 
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where Bi G i) are band matrices with the same bandwidth as ma- 
trix A, a,i,Ci e TZ^>^-3)x3^ aik,bik,Cik G , Xi,Di G Xik,dik G 

After suitable permutation of the rows and columns of matrix A we obtain 
the system 

AVx = Vd, A = VAV'^ = f ) , 

\A 21 A22 J 

where P is a permutation matrix, An = diagjSi, B2, ■ ■ ■ , B^} G ^ 

A22 = diag(6fc,&2fe,...,6(s-i)fc) G and A12 G , 

A 21 G 'JZ3{^—3)'XB{k—j) are sparse matrices. Evidently, the permutation does not 
influence the roundoff error analysis. 

The algorithm can be presented as follows. 

Stage 1 . Obtain the block LU-factorization 



A = 



f All ^12 \ 
\2I21 2I22 J 



= LU = 



[All 0 \ 

V^21 ) 



f I s{k-j) R\ 

I 0 s) 



by the following steps: 

1 . Obtain the LU-factorization of An = ViLiUi with partial pivoting, if nec- 
essary. Here Vi is a permutation matrix, Li is unit lower triangular, and Ui 
is upper triangular. 

2 . Solve AiiR = GIi 2 using the LU-factorization from the previous item, and 
compute S = A22 — A21R, which is the Schur complement of An in A. 

Stage 2 . Solve Ly = d by using the LU-factorization of An (Stage 1 ). 

Stage 3 . Solve Ux = yhy applying Gaussian elimination to the block S. 

The block R is quite sparse in the following kind 






R = 



\ 



G Rfd^~3)'^3{B-3) ^ 



P 



.B-l) 

j(b) 



/ 



V 
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where 



= {p(i-i)k+l,P{i-l)k+2, ■ ■ ■ ,Pik-l)^ G 
= ((?(,_ i)fe+i,(7(i_i)fe+2, . . . G . 



Let us note that matrix S (the so called reduced matrix) is block tridiagonal, 
and banded with bandwith 4j — 1 



/ V\ W\ 

U2 V2 W2 



S = 



\ 






’ ■ ■ ’ ■ ■ Ws-2 

\ Us-1 Vs-1 ) 



where the entries are computed in the following way 



— kLikQik—1^ — ^ik ^ikPik—1 ^ikQik+1 : — ^ikPik+1- (2) 



2 Main Stability Results 

In the following by a hat we denote the computed quantities. By AT we denote 
an equivalent perturbation in matrix T, and by po we denote the roundoff unit. 
The matrix inequalities are understood componentwise. 

In other our previous work (see [9]) has been obtained bounds first for the 
backward error: 



< |^|/n(po) + |^||iV|/r2(po), 

where 

/ii(po) = Kif(po) + K2KP0) + KiK2f{po)h{po) 

+^i/(Po)ff(po) + K2h{po)g{po) + KiK2f{po)h{po)g{po), 

^2(po) = 3/Gi/(po) + 2K2h{po) + 2KiK2f{po)h{pQ) 

+3Kif{po)g{pQ) + 3K2h{po)g(po) + 3KiK2f{po)HpQ)g{po) 
+Kif{po)g^{po) + K2h{po)g^ {pq) + KiK2f{po)h{po)g^{pQ), 

and for the forward error it is true that 

^ < cond(A,i)/ii(po) + cond*(A,x*)r-/i 2 (po)- (3) 

ll^ll Halloo 

In the above bounds we denote: 



r = max{||i?||oo, I}, K\ = max{fci, 1}, K 2 = max{fc 2 , 1} 
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where k\ bounds the growth of elements when we obtain the LU factorization 
of All (Stage 1), k 2 bounds the growth of elements of the Gaussian elimination 
for the reduced system (Stage 3), and 



f(Po) = Ij+i + 72j+i, g{po) = Ij+i + Po, Hpo) = l 2 j-l + 



where 7 „ = npo/{l — npo), and 
cond*(A,a;*) is defined below 



N = 



fO R 

\0 



. The condition number 



cond*(A, X*) 



II l^-^l 1^1 X* II 

Halloo 



where the vector x* is constructed in the following way 

X* = (pfcllooe, |ifc |,max{||xfc||oo, ||x 2 fc||oo} e, . . . , 

|®f^_i)fc|,max{||x(^_2)fe||oo, Pcs-ilfclloole)"^. 

Here e = (1,1,...,1) S The other condition number is known as the 

Skeel’s conditioning number: 



cond(H, x) 



II l^-^l 1^1 |x| II 

ll^llcx) 



The condition number cond*(H,x*) is introduced to make the obtained bounds 
more realistic in some cases. As we shall see in the bounds of the forward error the 
condition number cond*(A, x*) is multiplied by the factor r (which can be large 
sometimes) while the condition number cond(A, x) is not. So, when cond*(A, x*) 
is small the influence of r should be negligible. 



3 Special Classes of Matrices 

In this section we consider more precisely the case when matrix A belongs to 
one of the following classes: diagonally dominant, symmetric positive definite, 
or M-matrices. 

For the following bounds of ||i?||oo and k 2 we need to analyze what is the 
type of the reduced matrix S if matrix A belongs to one of the above mentioned 
classes. First we analyze the type of S in exact arithmetic because we need this 
to bound ||.R||oo- Then at the end of this section we consider the roundoff error 
implementation and comment on the growth of the constant k 2 - 

First we use well known fact that (see [1, p. 94] and [1, p. 209], respectively) 
if matrix A is either 

— symmetric positive definite, or 

— a nonsingular M-matrix, 
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then the reduced matrix S (the Schur complement) preserves the same property. 

It remains to prove that when A is a diagonally dominant matrix then S 
preserves this property. Let us note that the case when A is a block row diago- 
nally dominant matrix is considered in [4, p. 252]. Here for diagonally dominant 
matrices A = {aij} we assume row diagonal dominance in the sense that 

'^\a^j\<\au\, i,j = 1,2,... ,n, 

if 

which is wider than the block row diagonally dominant matrices analyzed in [4, 
p. 252]. 

Theorem 1. Let A € j^nxn ^ nonsingular row diagonally dominant band 
matrix. Then the redueed matrix S (the Sehur eomplement) preserves the same 
property. 

Proof. Let us construct the matrix = {Bi,di,Ci). It is obvious that this 
matrix possesses the property of row diagonal dominance. Now the question is 
if preserves the property of row diagonal dominance when the Gaussian 
elimination is applied to matrix Bil A similar problem in the case of an ar- 
bitrary dense matrix is studied in [3], where it is shown that the property of 
diagonal dominance is preserved after forward Gaussian elimination. For the 
backward Gaussian elimination in analogous way it follows that the same prop- 
erty is preserved. Hence it is true that the matrix preserves the property 
of row diagonal dominance when the forward Gaussian elimination and back 
substitution are applied to matrix Bi. Let us denote the result of this phase as 

= (4) 



Now we will prove that the reduced matrix S also preserves the property of 
row diagonal dominance. Let us consider an arbitrary l-th row of S: 



0,...,0,u\ 



(b 



Ai) 



.V 



(0 (0 
^ ^ w\’ 



0 0 



where without loss of generality it is assumed that the diagonal element is 
Then from (2) for the entries of S we obtain (for simplicity some of the indexes 
are omitted): 






where a^'‘\ b^’‘\ S p , p ■+ , From the fact that 

is a row diagonally dominant matrix it follows that 



(5) 

g(2) 



XI jp^l +X1 l9*-l ^ (6) 

2=1 2=1 



(7) 
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Let us introduce the vector e = (1,1,...,!)^ of size j. Then from (5), (6) and 
(7) we obtain 

li'il > 

> |i,S«| - |a<‘)| (e-± - ± - |c<‘l| (e - t IrSl ~ E 1‘lSl') 

\ i=2 / \ i=l i=2 / 

> - |a(')|e+ | ^ |(?|!^ | 

i—2 i—1 

|e + I \pfl I + I ^ k-+ 1 

i—1 i—2 

i—2 i—2 i—1 i—1 

But A is a row diagonally dominant matrix, i. e. 

1^® I - Y I “ I® “ I® - 

i^2 

Then from (8) and (9) we get 

iTi>EK“’i + Ei«fi + EK‘'’i. 

i—2 i—1 2—1 

Hence the reduced matrix S is row diagonally dominant. 

As we saw from (3) the error bound depends not only on the growth fac- 
tors Ki and K 2 , but also on the quantity r, which measures the growth in the 
matrix R. Clearly, when some of the blocks Bi are ill conditioned (although the 
whole matrix A is well conditioned) the factor r can be large. This will lead to 
large errors even for well conditioned matrices. So, we need some bounds for r, 
or , equivalently ||.R||oo- In the following we show that ||.R||oo is bounded by not 
large constants for the above mentioned three classes of matrices. 

The proofs of the next four theorems are similar to the proofs of Theorems 
5 - 8 in [8]. 

Theorem 2. Let A S be a nonsingular handed M -matrix and 

kicond{A) f {po) < 1. Then it is true that 

~ ^ condjA) ^ condjA) 

“ 1 - kicond{Aii)f{po) ~ 1- kicond{A)f{pQ) ' 

Theorem 3. Let A G be a nonsingular, row diagonally dominant handed 

matrix, and kicond{A) f (po) < 1. Then we have 

“ 1 - kicond{Aii)f{po) ~ 1 - 2kicond{A) f (po) ' 
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Theorem 4 . Let A € j^nxn ^ symmetric positive definite banded matrix 
and ki{k—l)cond2{A)f{pQ) < 1, where cond2{A) = ||T“^||2|| ^||2- Then we have 



- ^ y/j{s-l)cond2{A) ^ ^j(s-l)cond2(^ 

- I - kicond{Aii)f{po) ~ 1- ki{k - l)cond2{A)f{po) ' 



Theorems 2-4 show that ||i?||oo is bounded by not large constants for the 
three classes of matrices, if the whole matrix A is well-conditioned. In order to 
bound ^2 we can use the already obtained bounds for the Gaussian elimination 
in [4, p. 181, p. 206, p. 198], the already cited (in the begining of this section) 
properties of matrix S and Theorem 1. However, in practice we obtain the com- 
puted matrix S instead of the exact one. It is important to know what is the 
distance between S and S. This question is answered in Theorem 5. 



Theorem 5 . For the error L2S = S — S in the computed reduced matrix S it 
holds that 



||125||oo 

Halloo 



< Kicond{A)rf{po). 



So, our conclusion of this section is that the algorithm is numerically stable 
for the considered three classes of matrices. 

Unfortunately when the matrix of the system does not belong to the above 
mention classes, the algorithm can breaks down or behaves poorly. In our paper 
we present also a stabilization version of the generalized Wang’s algorithm for 
banded linear systems. 



4 The Stabilized Algorithm 

As was noticed in the previous section the algorithm can break down, or behave 
poorly, when for i = 1, . . . , s{k—j) and are zero or small. So, we can perturb 
them in such a way that it would be away from zero. The stabilization step can 
be summarized as follows: 

if < ( 5 ) 

if (juf’l = 0 ) 

( 1 ) 

u] = S; 

else 

uf ^ + sign{u[^'’)S; 

end 

end 



In this way we shift away from zero. Hence, the algorithm ensures that we 
do not divide by a small number. 

From the other side the obtained solution is perturbed. Then we apply the 
usual iterative refinement from [3], with some modification: 
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= x; 

for 771 = 1, 2, . . . 

^(m) _ _|_ y(ni). 

end 



The difference here is that instead of A we solve perturbed systems with the 
matrix A + A, where Z\ is a diagonal matrix with all such perturbations, and 
X is the result of the perturbed algorithm before the iterative refinement is 
applied. We note that, when <5 = « 10“® (in double precision), in practice 

the perturbed solution is very close to the exact one and we need usually only 
one or two steps of iterative refinement, depending on what accuracy we require. 
Here by po we denote the machine roundoff unit. 

Taking into account [6] the condition of convergence of iterative refinement 
is 

Ccond(A)(5 < 1, 



where cond(A) is a condition number of matrix A and C is a constant of the 
following kind 



C = 



maxi(|H||x|)i 



i = 1, 2, . . . , n. 



mini(|H||x|)i 

A number of numerical experiments which confirm theoretical results and 
the effectiveness of the stabilized algorithm are available from the author. 
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Abstract. We describe a Maple package named D-NODE (Distributed 
Numerical solver for ODEs), implementing a number of difference me- 
thods for initial value problems. The distribution of the computational 
effort follows the idea of parallelism across method. We have benchmark 
the package in a cluster environment. Distributed Maple ensures the 
inter-processor communications. Numerical experiments show that pa- 
rallel implicit Runge-Kutta methods can attain speed-ups close to the 
ideal values when the initial value problem is stiff and has between ten 
and hundred equations. The stage equations of the implicit methods are 
solved on different processors using Maple’s facilities. 

Keywords: parallel numerical methods for ordinary differential equa- 
tions, distributed computer algebra systems, performance analysis. 



1 Introduction 

We will concerned with the numerical solution of systems of initial value ordinary 
differential equations (IVPs for ODEs) of the form 

y'{t) = f{t,y{t)), t€[to,to + n y(to) = 2/o, yoGi?™, ( 1 ) 

In the numerical solution of ordinary differential equations by implicit time- 
stepping methods a system of linear or nonlinear equations has to be solved each 
step. The costs of the linear algebra associated with the implementation of the 
implicit equation solver generally dominate the overall cost of the computation. 

The numerical integration of large IVPs is also time consuming. Such large 
(and stiff) problems often arise in the modeling of mechanical and electrical en- 
gineering systems or in the solution of semi-discretization of convection-diffusion 
problems [7] associated to time-dependent parabolic PDEs. The stiffness of these 
problems requires that the numerical methods to be used should be uncondition- 
ally stable, and therefore implicit. The methods are computationally demanding 
and require today’s fastest high performance computers for practical implemen- 
tations. However, access to a fast high-speed computer is not sufficient. One must 
also ensure that the great potential power of the computer is correctly exploited. 
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The aim of this paper is to investigate in what extent parallel implicit Runge- 
Kutta methods can be used to solve stiff initial value problems of ten to hundred 
equations using Distributed Maple. Stage systems to be solved are distributed 
among the processors of a cluster system. Tables and figures illustrate the per- 
formance of the implemented methods. 

The paper is organized as follows. Section 2 motivates the present work. Sec- 
tion 3 describes the objectives of a Maple package named D-NODE (Distributed 
Numerical solver for ODEs), implementing a number of difference methods de- 
signed in the idea of parallelism across method. In Section 4 we report the 
numerical results obtained using some known parallel implicit Runge-Kutta me- 
thods. We have benchmark the package in a cluster environment. 

2 On IVP Solving Strategies 

One iterative step of many implicit schemes for IVPs of the form (1) requests 
the solution of a system of algebraic equations of the form 

Y - h{C ® I^)F - G{Yo) = Q (2) 

with h the step-size, C a s x s matrix, Im the identity matrix, G a known 
function, F = (/i, . . . , /s)^, fi = f{ti,yi), and F = (yi, . . . , y^)^, the unknown 
approximations to the exact solution on ti, . . . , F. It is common practice to use 
fixed-point iterations or, in the stiff case, some modified Newton iterations. The 
convergence rate of such methods depends on the method step-size. Implicit 
Runge-Kutta schemes (IRK) are among the numerical techniques commonly 
considered as efficient ones in stiff IVP case. The use of a s-stage IRK method 
for ODEs requires the solution of nonlinear systems of algebraic equations of 
dimension sm (m defined in 1). Usually, the solution of this system represents 
the most time-consuming section in the implementation of such method. 

A general way of devising parallel ODE solvers is that of considering meth- 
ods whose work per step can be split over a certain number of processors. The 
so-called solvers with parallelism across the method are then obtained. Such 
methods are essentially Runge-Kutta schemes. For a parallel implicit Runge- 
Kutta methods the system (2) can be split into a number k < s independent 
subsystems. From the computational point of view, the diagonally implicit RK 
methods (DIRK methods) are the most attractive methods since they have 
suitable stability properties and the implementation can be carried out with 
a lower computational cost than fully IRK methods. Block diagonally implicit 
RK methods (BDIRK) are also used. The so-called PDIRK methods are parallel 
diagonally iterated RK methods. The computational cost involved in their im- 
plementation is similar to DIRK methods. PDIRK methods are able to produce 
accurate results at a relatively high price. Unfortunately these methods are not 
the most suitable for solving semi-discretized PDEs in which it is necessary to 
generate relatively low-accuracy results at low price [2]. The construction of a 
some PDIRK using Maple is presented in [4]. Parallel singly diagonally iterated 
RK methods (PSDIRK) are particular methods of PDIRK type. 
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Computer algebra systems (CAS) can be used with success in prototyping se- 
quential algorithms for symbolic or numeric solution of mathematical problems. 
Maple is such a CAS. Constructing prototypes for parallel algorithms in Maple 
for numeric solution of ODEs is a challenging problem. Distributed Maple [12] 
is a portable system for writing parallel programs, in a CAS, which allows to 
create concurrent tasks and have them executed by Maple kernels running on 
different machines of a network. The system can be used in any network environ- 
ment where Maple and Java are available. The user interacts with the system 
via the text oriented Maple front-end. It also provides facilities for the online 
visualization of load distribution and for post-execution analysis of a session. 

We know that solving systems of algebraic or differential equations of order 
several hundreds can be an unsolvable problem for an actual CAS. Systems of 
order several tens equations can be solved with Maple but the long running-time 
may be a great problem for the user. A correct use of extensions like Distributed 
Maple can improve the solution computation time. The computer facilities re- 
quired by such an extension are reasonable for any user since it not supposes 
access to super-computers. 

In general, the system (2) is solved numerically using repeated evaluation 
of the function F at different values (in the case of a stiff system, they are 
also required some repeated Jacobian matrix evaluations). In a message-passing 
computing environment these values must be communicated between different 
processors participating to a time-step integration of the IVP. Sending to a 
working processor the algebraic expressions of the part of F for which it is 
responsible can be a better solution eliminating a significant quantity of values 
to be communicated between the supervisor-processor storing F and the worker- 
processors. The interpretation of an algebraic expression requires at a worker 
processor side at least a small specific expression interpreter (like Maple kernel). 

The implicit equation solver can substantially affect the global error of the 
numerical solution of an IVP. Take for example the fixed-point iterations which 
usually do not converge in the stiff IVP case to the exact solution of the system 
(2). Using a fixed-point iterations and ignoring this remark and also the use error 
control strategies, we can obtain a numerical solution far from the real solution. 
In practical implementation of implicit time-stepping methods the hardest parts 
are the implicit equation solver implementation and the error control mechanism 
combined with variable step-size strategies. Using numerical facilities of CAS 
systems to do the first job it can simplify the programmer work. 

We propose the use of implicit equation solver of Maple for the solution of 
system (2). In the case of parallel IRK, independent stage-subsystems in Maple 
algebraic form are to be send to some worker-processors in order to solve them. 

3 D-NODE Objectives 

The project of a Maple package, D-NODE (Distributed Numerical solver for ODEs) is 
intended to be an update to the ODEtools Maple package. It implements a num- 
ber of difference methods designed in the idea of parallelism across method [15]. 
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The package is a part of a bigger project of an expert system for numerical 
solution of ODEs [10] and it is expected to be finalized at the end of this year. 

The facilities of ODEtools from Maple, and the similar tools from other CAS, 
are far to cover all the user needs (for example, the stiff IVP solving case). Recent 
reports demonstrate the effort to improve these tools. For example, the paper [13] 
describes mathematical and software developments for a suite of programs for 
solving ODEs in Matlab. 

D-NODE package has similar facilities with EpODE (ExPert system for 
ODEs), recently presented [10] and available at http://www.inf 0 .uvt.r 0 /~ 
petcu/epode: a large collection of parallel methods working in a distributed 
computing environment, automatic detection of method properties including 
method classification, order, error constant and stability, degree of parallelism, 
method-interpreter for describing new methods, automatic detection of problem 
properties (like stiffness), step-size selection mechanism according the method 
and problem properties, numerical solution computation on a distributed net- 
work of workstations (in EpODE based on PVM [8]). 



4 Numerical Experiments 



This section is devoted to the interpretation of the test results in the integration 
of large non-linear ODE systems and to the comparisons with the test results of 
other similar tools (one of them being EpODE, part of the same project [9]). 

We consider four methods representative for their class of parallel IRK and 
which were included in D-NODE. We have benchmark the corresponding package 
functions in a cluster environment. The cluster comprises 4 dual-processor Silicon 
Graphics Octanes (2 RIOOOO at 250 MHz each) linked by three 10 Mbit Ethernet 
subnetworks connected to a router. 

The first scheme is the 4-stage, 2-processor, 4th-order, A-stable DIRK method 
described in [6]. The second one is a 6-stage, 2-processor, 3th-order, A-stable 
PDIRK method based on Radau IIA corrector and presented in [14]. The third 
one is the 4-stage, 2-processor, 4th-order, L-stable Hammer-Hollinworth BDIRK 
method [5]. The last one is the 9-stages, 3-processor, 4th-order, A-stable PS- 
DIRK presented in [2]. Details about these methods can be found also in [11]. 

The degree of parallelism of a method can be detected by applying the direct- 
graph method proposed in [5]. Figure 1, generated by EpODE, presents the pro- 
posed distributions of the computations on processes and parallel stages for the 
above mentioned methods. 

In order to show the performance of the methods on semi-discrete PDFs we 
include in our tests the linear IVP obtained from the following PDE [13]: 



du 

~dt 



d'^u 

dx^ 



X e [0,7 t], 

tG [0,10], 



u{x, 0) = sin(x), 
u(0, t) = u{tt, t) = 0. 



(3) 
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Proc«s1 Pto»ss2 




Proo«5 I Pfo*«5 2 




Pjoc«s 1 Process 2 




Process 1 Process 2 Process 3 



Fig. 1. Data-flow graphs reported by EpODE [10]: from left to right and from 
top to bottom, the four methods - An arc from the left-part of a circle means 
a dependency from the top variable to the bottom variable, an arc from the 
right-part of a circle means a dependency from the bottom to the top variable 
when starting the next integration step, and an almost horizontal arc indicates 
an interdependence between the two linked variables. A fc-labeled node refers 
the solving procedure for obtaining the value of the variable k using the pre- 
vious (above) computed labeled node-values. More than one labeled-node in a 
computational cell indicates that a system formed with those variables must be 
solved 



As the second test problem we take the nonlinear IVP obtained by the semi- 
discretization of the following nonlinear convection-diffusion problem [1]: 



du d'^u 



xcos{t) 



du 

dx 



x^ sin(t). 



a:e[0,l], ^(0,^) = 0, M(a:,0) = . 

tG[0,l], u(l, t) = cos(t). ' 



In order to solve both problems, we carry out a semi-discretization on the spatial 
variable by using second-order symmetric differences on an uniform grid with 
mesh size Ax = l/(m-|- 1). This method (of lines) leads to IVPs with m ODEs. 
As the third problem we take a real one. The selected PLEI [3] problem (28 
ODEs) is the celestial mechanics problem of seven stars. Similar IVPs have been 
studied in [7] for the case of a shared-memory parallel computer. 

Figure 2 obtained by using the visualize procedures from Distributed Maple 
shows the ratio between sequential and distributed time measurements corre- 
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local 



remote 



if>Sla 9Ws 



sefl septeOTs 

Fig. 2. Time diagrams and processor load per k integration-steps: left figure for 
the first method and the linear problem with m = 60 and fc = 5 steps, right 
figure for the fourth method and the nonlinear problem with m = 20 and k = 1 
steps 



spending to one or more arbitrary integration step. First vertical block of each 
figure corresponds to one sequential-integration, and the second one to the dis- 
tributed integration. A horizontal line corresponding to a local or remote pro- 
cessor indicates the time when that processor is busy (continuous tasks). The 
time in seconds reported in the bottom-right corner of each figure represents 
the total time, the sequential one plus the distributed one. The time difference 
between a local and a remote task can be explained by the fact that the local 
processor must compute explicitly the approximate solution yn+i (y in Figure 1) 
from computed Y vector [x and fc^), must send the tasks to the other processors 
and the must prepare the algebraic systems to be solved. 

Figure 3 also produced by Distributed Maple offers more details about 
the load-balancing between the running processes. Analyzing the top images we 
see that small linear IVPs (at least for our test problem with m = 10 -7 20 
equations), cannot be integrated in a distributed computational environment 
faster than using a sequential computer, since the distributed task are small 
relative to the overall time spent in one distributed integration step (including 
the necessary communications). In the case of nonlinear problems of similar 
dimensions, almost all computation time is spent on computing stage solutions 
(continuous horizontal lines). 

The efficiency measurements of the distributed implementation of the se- 
lected method are shortly presented in Table 1 . The vertical lines split the ineffi- 
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Fig. 3. Load balancing for fc-integration steps: PSDIRK for linear problem with 
m = 20 and fc = 10 steps (top-left), with m = 10 and fc = 10 (top-right), and 
for nonlinear real problem with m = 28 and fc = 1 (bottom-left), respectively 
method DIRK method for nonlinear problem with m = 25 and fc = 1 (bottom- 
right) 



Table 1. Efficiency results Ep = Ts/{pTp) (in percents), where p is the number 
of processors, Tg, Tp are the mean times necessary to perform one-integration 
step using one, respectively p processors 







Linear problem 


Nonlinear problem 


Real 


Method 


p\m 


10 


20 


40 60 


5 


10 


15 20 


28 


DIRK 


2 


9.84 


25.38 


53.85 81.25 


24.62 49.97 


86.11 94.12 


97.00 


PDIRK 


2 


13.96 29.05 


66.66 76.72 


28.26 


51.43 87.38 97.00 


94.12 


BDIRK 


2 


34.48 


63.08 86.57 88.32 


63.08 92.00 99.98 99.99 


95.54 


PSDIRK 


3 


10.18 22.67 


50.43 67.70 


24.31 


51.88 81.96 99.10 


97.33 



cient values (left) from the efficient values (right). We can arrange the analyzed 
methods in a increasing trust order depending on the order in which they attain 
the vertical lines: DIRK, PDIRK, BDIRK (we must prefer the BDIRK method). 
These methods appear in the reverse order if we sort them by the moment when 
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they complete a time-step (DIRK is the faster one). Therefore supplementary 
parameters (like recommended step-size) must be take into account when we 
select a distributed methods. We see also that the 3-processor PSDIRK method 
can be almost so efficient as a two-processor method when we solve a nonlinear 
problem. 

We cannot expect to obtain similar efficiency results when we use explicit 
Runge-Kutta methods (or explicit multistep methods), since the solution of a 
stage equation involves only a small number of function evaluations and vector 
operations. 

Comparing the above efficiency results with those reported [9] for similar 
problems using EpODE written in C, we must remark here a lowest barrier in IVP 
sizes between efficient and inefficient implementation of distributed solvers. This 
fact is due to the implicit equation solver implemented in Maple which is more 
time-consumer than some modified Newton iterations written in programming 
language like C. On other hand we can have more trust in the Maple solution of 
implicit equation system. Using the accurate solution of implicit stage-equations 
produced by Maple we can apply the error control strategies for ODE solvers 
often reported in literature (usually the great influence of the implicit equation 
solver on the global error of the numerical ODE solution is neglected). 

5 Conclusions 

D-NODE, a Maple package using Distributed Maple extends the numerical ODE 
solving capabilities of Maple to systems of order tens or order hundreds of equa- 
tions by exploiting the computational power of a local network of workstations. 
A strategy was adopted in which parts of some nonlinear systems to be solved at 
each time-step are send in algebraic forms to the workers. The solution accuracy 
compensates the supplementary time required by this non-classical procedure. 
Efficiency measurements indicate that the parallel implicit Runge-Kutta meth- 
ods are fitted with this strategy. 
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Abstract. We give the formulation of the Von Mises problem of the 
boundary layer of triple deck type. An original non-local condition ap- 
pears. We prove the existence of a solution by studying a semi-discrete 
scheme in which we consider the pressure gradient as a parameter. We 
then obtain a solution in physical variables but the condition v{x, 0) = 0 
is not proved. Besides, the numerical simulations give a surprising non- 
uniqueness result with given pressure in the case of a break-away. 



1 Introduction 

The triple deck model was introduced by Stewarston and Williams [7] in 1969 
for supersonic flows. Several other models of this type have been introduced 
later (see references in [4] and [6]). All these models describe the behaviour of a 
newtonian flow around a perturbation at high Reynolds numbers. In [3] and [6], 
we introduce a model for a Couette flow in a channel. 




The lower wall is fixed and has a small perturbation. The upper wall is a flat 
plate moving with velocity 1 (after adimensionalisation) . The entering velocity 
profile is U{X, Y) = Y. The size of the perturbation and of the associated layers 
are related to the Reynolds number : see Figure 1 where we set Re = e“™. More 
precisely, the pair (m,a) must verify — m < a < 0. 
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The boundary layer of triple deck type is the inner layer or deck (size 
e(™+“)/3) in Figure 1. We isolate a canonical problem for the boundary layers 
of this type and we show how to solve it. 

2 The Canonical Problem 

We now use the inner variables of the boundary layer. Let x and y be the lon- 
gitudinal and transversal coordinates, u and v the longitudinal and transversal 
velocities and p the pressure. The canonical problem consists in the Prandtl 
equations 



du du dp d'^u 

^ dx~^^ dy dx~^ dy'^ ’ 


(1) 




du dv 


(2) 








7f = 0- 

dy 


(3) 


the initial condition 


w( 0 , 2 /) = y 


(4) 


and the boundary conditions 






lim 

y^+oc 


u{x,y) - y = Ad{x) 


(5) 


and 


u{x, 0) = v{x, 0) = 0. 


(6) 



The data is the displacement Ad and the unknowns are u,v on [0,xq] x [0,-|-oo[ 
and p on [0,xq] (so that (3) is already taken into account). 

In (5), the term y represents the non-perturbed velocity profile. The Prandtl 
transformation 



x = x J u(x, y) = m(x, y) \ 

y = y + Ad{x) \v{x,y) = v{x,y) + Ad'{x)u{x,y) Py^> ~ 

does not change (l)-(4) and enables to interpret Ad as a geometrical pertur- 
bation. Then, (l)-(6) becomes exactly the problem of the Poiseuille or of the 
Couette flows. The pressure never appears as a data in the physical problems. 

There is no direct relation between Ad and p as it appears in the sequel. The 
situation is quite different from that of the classical Prandtl problem where p is 
given and where the condition which replaces (5) is automatically verified [2]. 

As always done for the Prandtl problem, we use the Von Mises transformation 
defined by the change of variables and the change of functions 



^{x,y) = X 

i’{x,y) = u{x,t) dt 



and = u{x{^,ip),y{^,ip)). 
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We solve the problem in physical variables by first solving the Von Mises problem 
and then applying the inverse Von Mises transformation. We therefore look for 
positive solutions. 

The first original result lies precisely in the formulation of the Von Mises 
problem where a new non-local condition appears [4]. It consists in the Von 
Mises equation 



dw 




2p' 



(8) 



and the conditions 



w;(0» = 2i/', 



( 9 ) 



u;(e,0) = 0 



( 10 ) 



and 



Am 







dip. 



( 11 ) 



The difficulties of (8)-(9)-(10)-(ll) come from the nonlinearity of the equations, 
their degenerating ip = Q, the semi-infinite domain and the condition (11) 
associated to the determination of the additional unknown p. The results of 
Oleinik, Nickel, Walter, Fife, Serrin (see references in [1], [4]) cannot be used. 

An original method based on the study of a semi-discrete scheme is developed. 
The pressure gradient is considered as a parameter and the problem is solved 
by induction. We did not find any method for solving directly the continuous 
problem (8)-(9)-(10)-(ll). 



2.1 The Problem with Given Pressure 



The sequences of pressure gradients (p'")n>o are first considered as data and 
the displacement does not appear. We suppose there exist Mi > 0 and M 2 > 0 
such that —Ml < p'" < M 2 for all n. The problem considered here consists in 
finding the sequences (rc”)„>o solution of the equation 



w — w 



n— 1 






— Vw 



dip"^ 



= -2p' 



( 12 ) 



which satisfy 

w^{ip) = 2ip, u;"(0) = 0 and — 2^| bounded. (13) 



The study is particularly difficult for positive pressure gradients. We prove [4] 

Theorem 1. a) Let k\ G ]0,2[. There exists ^0 = ^o{Al 2 ,ki) > 0 and there 
exists a sequence (w")n>o in- (^^([O, -|-oo[) n C°°(]0, -|-oo[) which is solution of 
(12)-(13) and verifies w" > k\ip for all n such that nAf < 
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Moreover, there exists k 2 = fc2(-^i,Co) ^ 2 sueh that 

dw^ 

— 2M2 ^0 < tti" — 2'ih< 2M\ ^0 and ki < < /c2- (14) 

aip 

b) If M 2 = 0, any ^0 > 0 admissible and > 2. 

Thus, any break-away is avoided on [0,^o]- 

We obtain this result by solving a regularized truncated problem. This 
method was introduced by Oleinik [2]. However, she uses specific techniques for 
parabolic problems. Here, we use in particular the monotone iteration method. 
We also prove the asymptotic behaviours [4] 

Theorem 2. For all 7 > 0, there exist kj = k-y{k 2 ,"f) > 0 and hj = 
hj{ki,k 2 ,"f) > 0 such that 

< max(2 - fci , fc2 - 2) e (15) 

and |u;”-2V’ + 2p”| + (16) 

for all Ip > 0 and all n such that n Af < ^0 if Af^ < hj. 

We need then very precise bounds of the diffusive term g" = y/iMd^w'^. We 
obtained them only in the case —Mi < p'" < 0. In the sequel, we then focus 
on this case which corresponds also to a specific case for the displacements. Let 
us set p' ° = 0. We show [3] [5] 

Theorem 3. Let us suppose —Mi < p^ " < 0. There exists ho = ho(Mi, ^0) > 0 
such that 



dw^ 

dip 



-2Mi < min 2p'* < g” < 0 

0<2<n 


(17) 


and |g”| < 2 MieV^ + 


(18) 



where Mq = max(2Mi,(^o^ + 1)^(16 + Co)) for all n such that nAf^ < Co if 
Af<ho. 

The lower bound of g" in (17) is optimal and essential for solving the inverse 
problem where the displacement is given. Using the boundedness of g", we can 
then obtain estimates depending on 7 as in (15) and (16). In (18), we fixed 7 
and we obtain 

> -Ml (19) 

for Ip > 6 Mq and Z\C < ho. Thus, only the data Mi and Co appear in (19). This 
inequality is also important for the inverse problem. 
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q" 



-2M 




Fig. 2. Bounds of q” in the case —Mi < p' " < 0 



Remark 1. It would be possible to solve the inverse problem in the general case 
corresponding to pressure gradients of any sign if g" < maxo<i<n 2p' ® was true. 
It is only established in the case of non-decreasing pressure gradients, which is 
the worst case a priori [3]. Using a regularized scheme [5], we obtain a bound 
greater than max g<i<n 2p'"‘ and we cannot apply the method exposed below. 

We also need the uniqueness and the continuity of w" with respect to p ' " . 
They are obtained in a class of function S whose elements verify the boundedness 
and regularity properties of the solution we constructed up to here [5]. The 
boundedness properties are 

0 < ic" - 2V’ < 2MiVo, 2 < < fca and - 2Mi < g” < 0. (20) 

Theorem 4. a) There exists hi > 0 such that the solution vM of (12)-(13) is 
unique in S for all n such that n Af < Vo if Af < hi. 

b) Let n > 1 such that n Af < Vo (ind let —Mi < P 2 " — Pi” — There 
exists ft -2 G such that 



if Af < /i 2 where Wi and wlf are the solutions in S corresponding to p'l'^ and 
P 2 " respectively and to the same antecedent 

2.2 The Problem with Given Displacement 

We suppose here that the displacement Ad is lipschitzian non-decreasing, 0 < 
Ad''^ < C, and verifies ^^(O) = 0. We state the inverse problem by introducing 
a function which links pressure and displacement. 

Let w" be the solution in 5 = 5(M, Vo) of (12)-(13) which corresponds to a 
pressure gradient p'" in [— M, 0] and to a fixed antecedent After (11), we 

define the function A ' " by 



Wi < W 2 < Wi + 2{p 



/ n 
1 






(21) 




(22) 
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Fig. 3. Function A' "(p' ") for a non-decreasing displacement 



where 7 ” = + \/w^))~^. This function is well defined. 

Indeed, tc" exists for all n and is unique in S and the first estimate (20) shows 
that the integral (22) exits. 

The problem with given displacement consists in finding sequences (w")„>o 
and (p'”)„>o which are solution of (12)-(13) and verify A'"(p'") = Ad'”. We 
show that the frame — M<p'"<0is adapted to this problem. 

The function A ' ” summarizes the whole problem. We prove a local con- 
tinuity property and a global coercivity property [3]. The first one is brought 
by Theorem 4 which implies the continuity of each function A'” in [— M, 0]. 
The coercivity property is stated: for all £ > 0 and all > Oj there exists 
M = M{C,^o) > 0 such that 

A'”(0)<0 and A'^{-M)>L (23) 

if — M < p ' ® < 0 for i < n — 1 and if n . 

The first inequality in (23) follows directly from (22) and g” < 0 if p' ” < 0. 
For the second, we use g” > —2M if p'" > —M. Indeed, (22) already shows 
A'^{—M) > 0. The conclusion finally arises using (19). The lower bound of g" 
in (17) is essential. 

We can then apply the theorem of the intermediate values and we solve 
the problem with given displacement by induction. We then obtain sequences 
")n>o in [-M, 0] and (w")„>o in S. Then, we can take the limit when A^ 0. 
The Von Mises problem is entirely solved and we obtain [3] 

Theorem 5. Let Ad be a lipschitzian non- decreasing function verifying Ad(0) = 
0. For all > 0; there exist a lipschitzian non-increasing function p and a 
lipschitzian concave function w > 2ij} which is once differentiable with respect to 
f and twice with respect to tp almost everywhere in the strong sense, such that 
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(9)-(10)-(ll) are verified strongly and such that (8) is verified almost everywhere 
in ]0, ^o[ X ]0: +oo[. For all 7 > 0, there exists k > 0 such that 

\w - 2tp + 2p\, \d^w-2\, \y/wd^w\ < . (24) 

The inequalities p' < 0 and w > 2tlj were expected. It would be possible 
to solve the problem in the general case of lipschitzian displacements if the 
inequality g" < maxo<i<rt 2p' ® was proved [3]. 

Then, we look for a solution in physical variables. The inverse Von Mises 
transformation is well defined since w" > 2ip. We prove its regularity using an 
original expression of y where the displacement appears [3] 

= \/^- AdiO+ f / ] dt. (25) 

Theorem 6. Let Ad he a lipschitzian non- decreasing function verifying ^^(0) = 
0. For all ^0 > 0, there exist a lipschitzian non-increasing function p, a concave 
function u > y which is once differentiable with respect to x and twice with 
respect to y a. e. in the strong sense and a function v once differentiable with 
respect to y a. e. in the strong sense, such that the following holds. The equa- 
tions (l)-(2) are verified almost everywhere m]0,^o[ x ]0,+oo[ and the conditions 
(4)-(5)-(6) are verified strongly except t)(x, 0) = 0 which is not proved. For all 
7 > 0, there exists k > 0 such that 

\u-y-Ad\, \dyu-l\, \d^u\ 1 ^ - 7 ^"''" 

\dxU — Ad'\, \v -\- Ad' y -\- Ad Ad' -\- p'\ J 

The condition x(x, 0) = 0 could be proved if the solution was more regu- 
lar. This result is probable. Indeed, the study of the function A'”(p'") sug- 
gests that the pressure gradients corresponding to lipschitzian displacements are 
i-holderian. The available estimates suffice to analyse the behaviours of the 
solution when y — > -l-oo but not when y — > 0. 



3 Numerical Simulations 



We consider the equations in physical variables in order to compute recircula- 
tions. Finite element schemes and finite difference schemes have been written. 
Small recirculations have been computed and the stability of the scheme is sim- 
ilar to that of the other known schemes when u < 0. 

Figures 4 and 5 represent the streamlines and the pressure gradient corre- 
sponding to a null displacement and to the geometrical perturbation (see (7)) 



Ad{x) = 0.8 



1 -I- cos 



27T X 

(LW 



(26) 



A very surprising result of non-uniqueness with given pressure has been ob- 
served [3]. Let {ui,vi,pi ') be the solution corresponding to a displacement Adi. 
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Fig. 4. Streamlines for (26) Fig. 5. Pressure gradient for (26) 




Fig. 6. Curves x u\{x,yj) and x — > U 2 {x,yj) for 0 < < 10 corresponding 

to the same pressure gradient p\ ' . Dotted lines: solution u\ with recirculation 
for Adi given where Adi = 0.7 [l + cos + tt)] . Continuous lines: solution U 2 
without recirculation for pi ' given. Using (5), one can retrieve Adi and Ad 2 

Let us suppose this flow contains a recirculation. Then, we solve the problem 
with given pressure using pi ' so that the Goldstein singularity is avoided and we 
obtain a displacement Ad 2 and a solution {u 2 ,V 2 ,Pi ') which coincide with Adi 
and (ui,ui,pi') before and after the recirculation but U 2 is always nonnegative. 
This second solution is then a solution without recirculation. The identity be- 
tween the solutions after the reattachment point tends to confirm the validity of 
the solution with recirculation. 

This non-uniqueness result is consistent with the absence of direct relation 
between Ad and p and their difference of regulatity {Ad lipschitzian and p' 
|-holderian) . This strengthens the difference with the Prandtl problem. 
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Cellular Neural Network Model for Nonlinear 
Waves in Medium with Exponential Memory 



Peter Popivanov and Angela Slavova 
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1 Nonlinear Waves in Medium with Memory 



This paper deals with one dimensional waves in medium with memory. Following 
[1] we shall denote by x a co-ordinate of a point belonging to a solid body, by t- 
the time variable, by e- the deformation, by a- the tension and b 

e{t) = f Vl + K* \J a {a) \/l + K* \J a dt- (1) 

J — OO 

In the previous equality K* is the convolution operator: 



K*u{t) 




r)u(r) dr. 



( 2 ) 



+ K* stands for the development of the operator 1 -|- K* into a power 
series and the integral operator \/l -|- K* as well as the multiplication operator 
a (tr) are acting on the function ctj. 

It is well known from classical mechanics that the next equation holds: 



d'^e d'^cr 
dt'^ dx^ 



(3) 



supposing e and a to be smooth functions of (t,x). 

Putting (1) into (3) we conclude that the tension a{t,x) satisfies a rather 
complicated nonlinear integro-differential equation. According to Theorem 7.1 
from [1] the equation (3) with e given by (1) can be sharply factorized into two 
first order factors describing the propagation of two waves of tension to the left 
and to the right-hand side respectively. 

Here are the factors: 

|vi + a'.^/7m±A 

(4) 

We shall concentrate our attention to (4). 
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Putting (Vl + K*)-^ = 1 - 

<P*u = f — t)u{t) dr 

J — OO 



we see that each smooth solution a of the nonlinear integro-differential equation 

= ^ 0) (5) 

will satisfy (3) with e given by (1). 

According to the mechanical terminology the function d? is called “kernel of 
heredity” . Assume that 

<p{t) = ke~^\ fc > 0. 

So we have that a wave of tension, propagating “to the right-hand side” is given 
by next nonlinear first order equation: 



, da da 



^-k(t-r) da{T,x) 



dx 



dr = 0. 



We shall assume, moreover, that 



i(0) = 0, a G a (a) > 0 and a € C^(x > 0). 



( 6 ) 

( 7 ) 



cr = 0 for cc > 0, t < 0, a(t, 0) = cto(^) C C^{R). (8) 

Obviously, cro(t) = 0 for t < 0. 

We shall construct a classical solution of the mixed problem (6), (8) and we 
shall prove results for globaly existence in time t > 0, x > 0 and for blow up of 
the corresponding solution. The symbol ||cro||co(iii) stands for the uniform norm 
of function ctq. We suppose further on that ||cro||co(fl), ||o'ollco(i?) < oo. 

This is our main result. 



Theorem 1. Consider the mixed problem (6), (8). Then 

(i) There exists a constant C'(||CTo(t)||c'o) such that if 
X CdlcTollco) < 1 then the problem (6), (8) possesses a unique global classical 
solution a G C^{x > 0,t > 0). 

The constant CdlCToUco) can be estimated in the following way: 



Cdlcrollco) < 




1 

™^|ff|<lkollco \/a'(^) 



.l/2max|CT|<||CTo||^o 



|g"(g^)l 

|a'(d)| ■ 



(a) at blows up for a finite X > 0 if 
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a), one can find a point /3o > 0 with the property 

where wo(/3o) = \/a'(A) d\, cto(/3q) > 0. 

h). one can find a point /3q > 0 such that cto(/3o) = 0; 

^ ^ '^o(/3o) 7^ 0,a (0) 0. 

CTo(/3o)a (0) 

The life span X of the corresponding solution in case b). can he estimated in 
the next way: 



X<x = - 



1 

\/a'(0) 



ln{l + 



2fcg'(0) 

o'o(/3o)a"(0) 



) >0. 



Differentiating (6) in t we have 



_a 

dt 




d da da , , 



, It'V 

' — oo 



ax 



So 



d , r~T~da da 

a<V“W» + & + ^ 



n(7{t,x) 



yj a {X) dX) = 0, 



i.e. 



\/a'('^)^ + + V^a'(A) d\ = f{x). 

According to (8): /(x) = 0. 

So we reduced the mixed problem (6), (8) to the following nonlinear equation: 



Q r<^d,x) 

dt ,/n 



da 



a' (A) dA + — + fc 



pa{t,x) . 

^a'(A)dA) = 0, (9) 



aft, 0) = cTo(t), CTo(^) = 0, t < 0, CT = 0, for X > 0, t < 0, (7o G C^{R). 

Let us make the change of the unknown function 

w = J d ®"(A) d\. (10) 

Obviously, w = > 0 there exists ip £ such that 

a = (fifw) (11) 



(i.e. ip is the inverse function, defined by (10) ). 
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Then (9) will be rewritten in the form 



dw > , .dw , 

rcroit) I 

w{t,0) = J y a (X) dX = wo{t) , 

wo{t) = 0 for t < 0, mo C m = 0 for x > 0, t < 0. 

In fact, w{t,x) = w{a{t,x)) = a/o'(A) dX and therefore 



dcT 



dw dw da 



( 12 ) 



Remark. The function G(cr) = \J a {X) dX is a diffeomorphism: G : 

(— oo, +oo) 

^ {A, B), where A = °° \J a'(A) dX, B = \J a' (A) dA, — oo < A < 0 < B < 

oo. Thus, a = G~^{w) = ^p{w) is well defined and smooth on the open interval 
{A,B), wo{t) = G{ao{t)) CTo(<) = (/?(mo(t)). 



2 Cellular Neural Networks (CNNs) 

Cellular Neural Networks (CNNs) are nonlinear, continuous computing array 
structures well suited for nonlinear signal processing. Since its invention in 1988 
[2,3], the investigation of CNNs has envolved to cover a very broad class of 
problems and frameworks. Many researchers have made significant contributions 
to the study of CNN phenomena using different mathematical tools. 

Definition 1. The CNN is a 

i). 2-, 3-, or n- dimensional array of 

a), mainly identical dynamical systems, called cells, which satisfies two 
properties: 

Hi), most interactions are local within a finite radius r, and 

iv). all state variables are continuous valued signals. 

Let us consider a two-dimensional grid with 3x3 neighborhood system as it 
is shown on Fig.l. 

The squares are the circuit units - cells, and the links between the cells 
indicate that there are interactions between linked cells. One of the key features 
of a CNN is that the individual cells are nonlinear dynamical systems, but that 
the coupling between them is linear. Roughly speaking, one could say that these 
arrays are nonlinear but have a linear spatial structure, which makes the use of 
techniques for their investigation common in engineering or physics attractive. 

Definition 2. An M x M cellular neural network is defined mathematically by 
four specifications: 
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Fig. 1. 



1) . CNN cell dynamics; 

2) . CNN synaptic law which represents the interactions (spatial coupling) 
within the neighbor cells; 

3) . Boundary conditions; 

4) . Initial conditions. 

Suppose for simplicity that the processing elements of a CNN are arranged 
on a 2- dimensional (2-D) grid (Fig.l). Then the dynamics of a CNN, in general, 
can be described by: 

C(kl)GNr.{ij) 

+ E i,kl{ukhUij) + lij, 

C{kl)GNr{ij) 



(14) 

l<i<M,l<j<M, 

Xij,yij,Uij refer to the state, output and input voltage of a cell C{i,j); C{ij) 
refers to a grid point associated with a cell on the 2-D grid, C{kl) G Nr{ij) is 
a grid point (cell) in the neighborhood within a radius r of the cell C(ij), Cj is 
an independent current sourse. A and B are nonlinear cloning templates, which 
specify the interactions between each cell and all its neighbor cells in terms of 
their input, state, and output variables [9,10]. 

Now in terms of definition 2 we can make a generalization of the above 
dynamical systems describing CNNs. For a general CNN whose cells are made 
of time-invariant circuit elements, each cell C{ij) is characterized by its CNN 
cell dynamics : 

—g(xij , Uij , Ij^j ) , 



Xij — 



(15) 
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where Xij G R™, Uij is usualy a scalar. In most cases, the interactions (spatial 
coupling) with the neighbor cell C{i + k,j + 1) are specified by a CNN synaptic 
law: 



^ij — ^ij^kl * f kli^^ij ^ ^i+k,j+l^ 



(16) 



The first term Aij^kiXi+kj+i is simply a linear feedback of the states of the 
neighborhood nodes. The second term provides an arbitrary nonlinear coupling, 
and the third term accounts for the contributions from the external inputs of 
each neighbor cell that is located in the Nr neighborhood. 

As it was stated in [4,8], some autonomous CNNs (there are no inputs, i.e. 
Uij = 0) represent an excellent approximation to the nonlinear partial diffrential 
equations (PDEs). Although the CNN equations describing reaction-diffusion 
systems are with the large number of cells, they can exhibit new phenomena 
that can not be obtained from their limiting PDEs. This demonstrates that an 
autonomous CNN is in some sense more general than its associated nonlinear 
PDE. 



3 CNN Model for Nonlinear Waves in Medium with 
Memory 

Let us consider equation (12) in the following form: 

dw I dw 



(17) 



For solving such an equation spatial discretization has to be applied. The 
PDE is transformed into a system of ODEs which is identified as the state 
equations of an autonomous CNN with appropriate templates. The discretization 
in space is made in equidistant discrete steps h. We map w{x, t) into a CNN layer 
such that the state voltage of a CNN cell Xij{t) at a grid point (i, j) is associated 
with w{ih,t), h = Ax. Hence, the following CNN model is obtained: 



dwi / (wi+i - Wi_i) 

- = j k„. 



(18) 



If we compare the above equation with the state equation of nonlinear CNN 
we directly find the templates: 



-* -"A 

We will consider the following examples for our CNN model (18): 

Let a(A) = ^ Then w = ^fa d\ = e'^ — 1 ct = ln{w + 1) and 

v\w) = 



2 
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a). The initial condition is: 



f 0, t<0, 

—^sint, t > 0. 




Fig. 2. 



b). The initial condition is: 



f 0, <<0, 

1 — cos t,t>0. 
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Abstract. The nonlinear instability of a compound jet consisting of a 
liquid core and immiscible coaxial liquid layer is studied. The equations 
of motion for both liquids (phases) are used in one-dimensional (1-D) ap- 
proximation similar to that known for one-layer jet. A numerical method 
is proposed for calculation the radiuses of both interfaces and axial ve- 
locities of the core and outer layer. The method is tested for determining 
the typical forms of compound jet disintegration. 



1 Introduction 

The compound jet generation principles and a qualitative description of the hy- 
drodynamic of the jet have been given by Hertz and Hermanrud [1]. In their 
experiments they observed three different types of compound jet instability, 
namely capillary, sinuous and varicose instability depending on the jet velocity. 
The present paper is restricted to the analysis of the capillary instability only. 
The latter manifests itself into disintegration of the jet into drops of different 
configurations and sizes. 

The first models developed to study this kind of compound jet instability 
are based on the one-dimensional approximation of the Navier-Stokes equations. 
Based on this approximation in Radev and Shkadov [2] a linear analysis of the jet 
instability is performed which reveals three different break-up regimes, namely 
breaking as a single jet, breaking of the core and disintegration by meeting of 
the interfaces. (Further on for brevity these regimes will be referred as First, 
Second and Third break-up regimes, respectively). Similar analysis is proposed 
by Sanz and Meseguer [3]. 

As it could be expected the above linear models are well suited to the initial 
evolution of the perturbations along the jet but failed to predict the final break- 
up configuration, which is strongly controlled by the nonlinear effects. The latter 
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are taken into account in Epikhin et al.[4] and Radev et al. [5] in which the jet 
flow is assumed of uniform velocity profile and approximated by one-dimensional 
equations of motion. The disturbances are considered periodical in space of a 
given wave length, whose amplitude increases in time. The analysis in Epikhin 
et al. [4] is made by a decomposition of the disturbances in a Fourier series 
with unknown amplitudes, while in Radev et al. [5] a spline-difference numerical 
method is proposed. The experimental observations that the jet break-up gives 
rise of both main and satellite drops are confirmed numerically as well. Moreover 
it is shown that the satellites for the First disintegration regime are formed from 
the core liquid only and are entrained by the layer flow. In the Second regime 
the compound satellite drops appear consisting of a core and concentric layer 
formed from the jet core and surrounding layer respectively. 

For completeness it should be mentioned that 2-D models of the compound 
jet instability are proposed in Tchavdarov and Radev[6] and Tchavdarov et al. 
[7]. In the former a linear analysis is performed while the latter is concerned 
with a direct numerical simulation. 

The present paper deals with the nonlinear instability of a one-dimensional 
compound jet. A numerical method is proposed for calculating the evolution 
in time of both the interface radiuses and core and layer velocities. It allows 
accounting for a stepwise profile of the undisturbed velocity. The method is 
illustrated by the typical disintegration forms of the jet. 



2 Statement of the Problem 



The compound jet shown in Fig. 1 consists of an axisymmetrical liquid core 
of (undisturbed) radius Hi and density p\ and a surrounding coaxial layer of 
another immiscible liquid of outer radius H 2 and density p 2 ■ Both liquids are 
assumed incompressible and nonviscous. Hereafter the subscript j = 1 is set for 
the core, whereas j = 2 is used for the layer. 

The jet flow is related to a cylindrical coordinate system (r, z), whose z - 
axis is directed along the jet axis. By using iJ* and U* as respectively linear 
and velocity scales the 1-D equations of motion of the jet could be written in 
the following nondimensional form (for more details see Radev and Shkadov 
(1985)[2]) 



duj 

dt 



duj 
' dz 



dPj 

dz 



i = l,2. 



( 1 ) 



where the axial velocities Uj = Uj{t,z) and the pressures pj = pj{t,z) are un- 
known functions of the time and axial coordinate. 

Partial differential equations for the unknown radiuses r = hj(t, z) of the in- 
ner and outer interfaces are derived from the mass-conservation equation written 
simultaneously for the core and layer 



dhi dhi 1 dui 



( 2 ) 
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Fig. 1. Compound jet section of length A related to a cylindrical coordinate 
system. The undisturbed core and jet are assumed of constant radiuses {Hi 
and H2 respectively) and of uniform axial velocities ( U\ and U2 ), the latter 
allowing for a velocity jump (discontinuity) AU = Ui — U2 > 0; X stands for the 
wave length of the imposed disturbances 



dh2 dh2 1 






du2 1 hi dui hi dhi _ 

1^2)7 ^ — — 0 . ( 3 ) 

oz 2 /I2 uz 1I2 uz 



The pressure terms in eq. (1) are given in the form 

P2 • 1 o 

Pj = —Pj+i + ) J = 1 , 2 , 

Pj 

where Kj are the mean curvature of the interfaces 



1 + 



9hj 

dz 



-1/2 



1 + 



9hj 

dz 



d^h, 



( 4 ) 



( 5 ) 



while Oj = Tj/{pjHffU^) denote the corresponding inverse Weber numbers re- 
lated to the inner and outer surface tensions Tj. 
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In the absence of gravity it is convenient to seek spatially periodical solutions 
of the above system of partial differential equations, that is 



hj{t, z + A) = hj{t, z), 



Uj{t, z + A) = Uj{t, z), 



dhj , , , dhj , , du4 , , , duj , , 

^{t . . + A) = ^(t , + A) = — (f, 



where A represents the wave length. 



( 6 ) 



3 Linear Instability Analysis of a Compound Jet 

In the context of the linear instability analysis the jet flow is decomposed into 
a steady and nonsteady (disturbed) part. In the steady case the system (l)-(5) 
allows a simple solution of the form 

I ~ Uj ) C^) 

representing a compound jet of constant radiuses and uniform axial velocities of 
the core and coaxial layer. 

The perturbed flow is given in the form 



hj(t, z) = Hj + hj{t, z), Uj{t, z) = Uj + Uj{t, z), pj{t, z) = Pj + pj{t, z) ( 8 ) 

assuming that the nonlinear terms in respect to the disturbances are small 
enough to be neglected. The solution of the linearized boundary value problem 
(l)-(8) appears in an analytical form 

{hj,Uj,pj){t,z) = (hj,Uj,pj)exp[ia{z - ct)], ( 9 ) 

where a = 27 t/ A is a given wave number while the complex amplitudes hj , Uj , pj 
and complex phase velocity of the perturbations 

c=^ + ^^ ( 10 ) 

a a 

are unknown. In equation (10) oj denotes the angular frequency, while Cr = 
ujja stands for the phase velocity and aci = q - for the growth rate of the 
disturbances. The complex phase velocity and the wave number are connected 
in the following (usually called dispersion) equation 

(C/i - c)4 - 2(Ci - C/ 2 )(C/i - c)3+ 

[(C/l - U 2 )^ + i(T 2 (l - <j2)(l - a2) + A] (C/i - c)2- 

2Ai{Ui - U 2 ){Ui - c)+ 

[Ai{Ui - C/ 2 )^ + i(TiCT2(5“^(l - (5^)(1 - - a^)] = 0 , 



( 11 ) 
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where 



Al 



1 

2 



aiS ^(1 — + (T2<5^ — (1 

Pi 




( 12 ) 



In principle the initial conditions for the system (l)-(5) should satisfy the equa- 
tion (6), otherwise they could be chosen arbitrary. However from a physical 
point of view it will be of interest to have a possibility to study the evolution 
of initially small disturbances up to the break-up point. Following the linear 
instability theory in Radev and Shkadov [2] the form of the jet perturbations of 
sufficiently small amplitudes is derived from the linearized equations (l)-(5). Be- 
low on we briefly present some details concerning the linear instability analysis 
of a compound jet, which will be used in the formulation of initial conditions for 
the equations (l)-(5) fitted to the linear solution. For our further considerations 
we will need some details concerning the solutions of the dispersion equation. 




Fig. 2. Amplification rate of the disturbances versus wave number a at a zero 
undisturbed velocity jump. AU = 0,cti = 0.015,5 = 0.5,p2/pi = 1; Curves 1 
and 1 : (J\ju 2 = 100, 2 and 2 : a\ju 2 = 0.1. The superscript and above 
denote the first and second linear modes respectively. The maximum growth rate 
within the curve 1 is controlled by the inner surface tension. When the outer 
surface tension increases this maximum moves into the range of the long waves 
(curve 2 ) 

Fig. 3. The effect of the undisturbed velocity jump on the growth rate. AU = 
0.5, CTi = 0.015,5 = 0.5,P2 /pi = 1- Curves 1 , 1 and 1 : cti/ct 2 = 100, 2 , 2 
and 2 : cri/cr 2 = 0.1. The superscripts , and above denote the first, second 
and third linear modes respectively. In the interval of the very short waves a 
third mode is burned (curve 1 ). When the outer surface tension increases this 
mode moves into the range of the long waves with the highest growth rate inside 
it (curve 2 ). Simultaneously the second mode (curve 2 ) tends to move above 
the first mode (curve 2 ) at the begining of long wave interval 
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This is an algebraic equation of fourth order for calculating the complex phase 
velocity c as a function of the wave number at given values of the nondimen- 
sional parameters aj, Uj,S = H 1 /H 2 , po = P 2 I Pi- After determining the complex 
phase velocity the unknown complex amplitudes hj , uj could be found from the 
linearized equations (l)-(8) provided that the value of one of these amplitudes 
is given. 

In the particular case when the undisturbed velocity profile is uniform in the 
both phases {Ui = U 2 ) eq. (11) is reduced to a biquadratic equation. It is easily 
seen that in general this equation has two pairs of complex conjugate roots: 
the first one is defined within the wave number interval 0 < a < while 
the second - in 0 < a < 1 . The two branches (further on called modes) with 
positive imaginary parts Ci define two families of disturbances which grow with 
amplification rates equal to q = aci and propagate with one and the same phase 
velocity Cj. = [/i. In Fig. 2 the ’ q — a curves for both modes are illustrated for 
two characteristic values of the ratio 02 ! o\ of the surface tensions. If as usually 
we assume that in natural conditions the jet is disintegrated by the disturbances 
of a higher amplification rate then in Fig. 2 they correspond to the maximum 
oi ” q — a curve related to the first mode. However in the case of CT 2 /(Ti >> 1 
this maximum {q*j) is attached to the wave number close to the Rayleigh one 
a* « -\/2/2 and is controlled by the outer surface tension. In the case 02 ! cf\ « 1 
the maximum (q^) moves to the range of the shorter waves {a** « \f2j2S) being 
controlled by the inner interface. 

The " q — a curves in the case of a stepwise velocity profile are shown in 
Fig. 3 for a given value of the velocity jump AU = Ui — U 2 >0. The main 
difference in respect to the case of a continuous velocity profile manifests itself 
in the appearance in the range of the short waves of a new unstable mode, 
resulting in a third family of growing disturbances. The maximum growth rate 
of the disturbances q}jj and the corresponding wave number a*** depends on the 
value of the velocity jump AU: when the latter increases the maximum growth 
rate increases as well, while the wave number a*** moves into the direction of 
the longer waves. Looking at Fig. 3 it should be mentioned that at sufficiently 
high values of AU the maximum growth rate corresponding to the second mode 
(q}j) may become higher than to the first mode {q}))- 

Coming back to the nonlinear boundary-value problem (l)-(6), it is quit 
natural to apply equations (8) and (9) as initial conditions for this problem. 
It is important to note that in the conditions (8) and (9) one of the complex 
amplitudes say hj must be considered as an additional input parameter of the 
nonlinear problem. It will be denoted by hjo to point out that this is the initial 
amplitude of the corresponding interface radius at time t = 0 . As far as the 
complex phase velocity is explicitly involved in the linearized form of equations 
(l)-(5) (not written in the paper) the number of the selected mode will act as a 
second input parameter in the initial conditions (8) and (9). 
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4 Numerical Method 

In order to eliminate the disturbance translation along the jet axis it is conve- 
nient to introduce new independent variables (^, r) and new dependent variables 
{wj,IIj) as follows 



^ = az — ujt, T = 0 < ^ < 2tt. (13) 

% = + y/^Wj, Uj = (T~^pj. (14) 

In these expresstions to = acr and ct* stands for CT 2 (or cti). 

Following [8], for solving the nonlinear boundary value problem (l)-(6), writ- 
ten in new variables, we use the Continuous Analog of Newton Method (CANM). 
A finite difference method of second order for discretization the obtained CANM 
problem is applied. All results, shown in figures, are obtained using the Crank- 
Nikolson difference scheme with steps = tt/200, hr = 0.01. The CAMN needs 
2-3 iterations to solve the problem in each layer Tk = khr- 

The jet disintegration time rt, is determined when one of the following con- 
ditions is satisfied 

mm/ii(Tb,$) < 10“^, or min(h 2 (rh ,0 - hi(n,^)) < 10~^. 

5 Results and Discussion 

Due to the fact that the above described problem appears as multiparametric 
one, it is rather difficult to illustrate the effect of all entering parameters. For 
that we will limit our discussion to the case of zero velocity jump AU. In these 
conditions the jet instability is mainly controlled by the ratio cti/ct 2 of the sur- 
face tensions, whose effect will be analysied below. The values of the remaining 
nondimensional parameters will be fixed as follows: 

CTi = 0.015, (5 = 0.5, P 2 /P 1 = 1, h2o = 0.01. (15) 

Moreover we will concentrate our attention to the cases when the jet is ini- 
tially excited by the perturbations (8) and (9) related to the first mode of the 
dispersion equation. In general the calculations will be performed for the wave 
number of the highest amplification rate. The effect of the second and third 
mode remains to be studied additionally. 



5.1 Compound Jet Disintegration at ^ 1 

In this case the jet instability is controlled by the outer surface tension. The 
jet disintegration behaves like one-layer jet break-up, as shown in Fig. 4, whose 
parameters correspond to the curve 2 in Fig. 2. The resulting main and satellite 
drops are compound as well and consist of a core and concentric layer formed 
by the inner and outer liquid respectively. 
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5.2 Compound Jet Disintegration at cri/cr 2 ^ 1 

When the inner surface tension prevaluates the jet instability appears as a core 
disintegration resulting into main and satellite drop, which after breaking are 
entrained by the surrounding liquid. This disintegration regime of the compound 
jet is demonstrated in Fig. 5, whose parameters correspond to curve 1 in Fig. 2. It 
should be mentioned that after the core break-up the jet still remains continuous 
up to the breaking of the outer interface. However this break-up regime is out 
of the scope of our model. 

5.3 Compound Jet Disintegration at p 2 < pi 

A new type of jet disintegration appears if in the range cti/ct 2 1 the density of 
the outer liquid is decreased below the density of the core. As shown in Fig. 6 the 
minimum distance between the interfaces becomes zero, while the inner interface 
is still far from the jet axis. This form of a jet disintegration is admissible in the 
numerical experiments only if p 2 < Pi- However in contrast to the disintegration 
regimes shown in Fig. 4 and Fig. 5, this in Fig. 6 remains to be demonstrated 
experimentally. 





Fig. 4. Compound jet break-up as one-layer jet. a\ju 2 = 0.1, a = 0.707, AU = 
0, Tf, = 8.24. The remaining input parameters are given in (15). The jet is ampli- 
fied by the corresponding first mode (curve 2 in Fig. 2). Both interfaces break-up 
simultaneously at the same points forming one main and one satellite compound 
drop within one wave length 

Fig. 5. Compound jet disintegration due to the core break-up. cti/ct 2 = 100, a = 
1.41, Z\[/ = 0,Tb = 0.41. The remaining input parameters are given in (15). The 
jet is amplified by the corresponding first mode (curve 1 in Fig.2). The core 
breaks-up the first while the layer still exists as a coherent portion. The main 
and satellite drops detached from the core are entrained by outer flow 
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Fig. 6. Compound jet disintegration due to the meeting of the interfaces. 
o’i/c 2 = 0.1, a = 0.73, Tf, = 9.06,P2 /pi = 0.5 The values of cti, 5 and /120 are 
given in (15). The jet is amplified by the corresponding first mode. The outer 
interface approaches the inner one faster than the latter reaches the jet axis 



6 Conclusion 

The nonlinear instability of a compound jet is studied as a solution of initially 
small disturbances up to the jet desintegration. It is shown that the nonlinear 
effects significantly affect the final stages of the jet desintegration. The type of 
the latter as well as the type of satellite formation is mainly controlled by the 
ratio of the inner and outer surface tensions. The numerical method developed 
on the basis of one-dimensional equations of motion accounts for discontinuity 
(jump) of the velocity in both phases. However the effect of the velocity jump 
on the jet instability remains to be studied separately. 

References 

1 . Hertz, C. H., Hermanrud, B.: A liquid compound jet, J. Fluid Mech., 131 (1983) 
271-287 

2. Radev, S. P., Shkadov, V.Ya.: On the stability of two-layer capillary jets, Theor. 
and Appl. Mech., Bulg. Acad. Sci., 3 (1985) 68-75 (in russian) 

3. Sanz, A., Meseguer, J.: One-dimensional linear analysis of the compound jet, J. 
Fluid Mech., 159 (1985) 55-68 

4. Epikhin, V. E., Radev, S. P., Shkadov, V.Ya.: Instability and break-up of two-layer 
capillary jets, Izv. AN SSSR, Mech. Jidkosti I Gaza, 3 (1987) 29-35 (in russian) 

5. Radev, S. P., Boyadjiev, T. L., Puzynin, I. V.: Nnmerical study of the nonlinear 
instability of a two-layer capillary jet, JINR Commnnications P5-86-699, Dnbna, 
1986 (in russian) 

6. Radev, S., Tchavdarov, B.,: Linear capillary instability of compound jets, Int. J. 
Multiphase Flow, 14 (1988) 67-79 




Numerical Analysis of the Nonlinear Instability 701 



7. Tchavdarov, B., Radev, S., Minev, P.: Nnmerical analysis of compoud jet disinte- 
gration, Comput. Methods Appl. Mech. Engrg., 118 (1994) 121-132 

8. St. Radev, M. Koleva, M. Kaschiev, L. Tadrist, Initial Perturbation Effects on the 
Instability of a Viscous Capillary Jet, Recent Advances in Numerical Methods and 
Applications, Proc. of 4th Int. Conf. Num. Meth. Appl., 1998, Soha, Bulgaria, (ed. 
O. Iliev, M. Kaschiev, S. Margenov, Bl. Sendov, P. Vassilevski) , pp. 774-882, World 
Scientihc Publ. 




Modelling of Equiaxed Microstructure 
Formation in Solidifying Two— Component Alloys 



Norbert Sczygiol 

Technical University of Cz§stochowa, 
ul. Dabrowskiego 73, 42-200 Czestochowa, Poland 
norbert . sczygiolOimipkm . pcz . czest . pi 



Abstract. The paper deals with a numerical modelling of equiaxed mi- 
crostructure formation during the solidification of two-component alloys, 
poured into metal forms. The basic enthalpy formulation was applied to 
model the solidification. The formulation allows the characteristic di- 
mensions of computed microstructure in thermal calculations to take 
into account. The so-called indirect model of solidification (solid phase 
growth), which allows the modelling of all possible solidification courses, 
from equilibrium to non-equilibrium solidification, was used to model the 
equiaxed microstructure formation. This model was worked out from an 
approximate solution of the diffusion equation of solute in a single grain. 
The equiaxed grain size was dependent on the average velocity of cooling 
at the moment when the liquid metal reached the temperature of the be- 
ginning of solidification. The above simulation was performed using the 
NuscaS computer program, which has been developed at the Technical 
University of Cz§stochowa. 



1 Introduction 

Casting is one of the production methods for machine elements and equipment. 
Cast products are characterised by the fact that their shapes and properties are 
formed when liquid metal is passing to the solid state. The casting solidification 
is a heterogeneous process. This means that solidification proceeds differently 
in every point of the casting. All possible solidification courses are situated be- 
tween two extreme cases. The first describes equilibrium and the second non- 
equilibrium solidification. 

The solidification courses, characteristic for the majority of castings, are 
present between these two extreme cases, both of which are generally difficult 
to reach in real casting. Significant solute diffusion in the solid phase of growing 
grains occurs widely in solidification courses. This type of solidification can be 
called indirect solidification. The solute diffusion has a great influence on the 
microstructure formed during solidification. 

The casting microstructure is mainly composed of three zones of grains: 
equiaxed chill, columnar and equiaxed. The last one can have a dendritic struc- 
ture. In many cases the microstructure of whole castings is composed only from 
equiaxed grains. This often occurs in non-ferrous metal castings. 
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2 Solidification Model 



Solidification is stated by a quasi-linear heat conduction equation containing the 
heat source term, which describes the rate of latent heat evolution 

W ■{XWT) + p^L— = cp—, (1) 

where A is the thermal conductivity coefficient, c is the specific heat, p is the 
density (subscript s refers to the solid phase, I would denoted the liquid phase 
and / would denoted the pass from the liquid to the solid state), L is the latent 
heat of solidification and fs is the solid phase fraction. This equation forms the 
basis of the thermal description of solidification. Taking into consideration the 
enthalpy, defined as follows [1,2] 

H{T)= [ cpdT+p,L{l-MT)), (2) 

where T^ef is the reference temperature, one can pass to the enthalpy formu- 
lations of solidification. A few types of enthalpy solidification exist [1,2,3]. The 
so-called basic enthalpy formulation, which can be presented as [1,2, 3,4, 5] 

V • (AVT) = — , (3) 

is applied in this paper. Eq. (3) is obtained by differentiating the enthalpy given 
by Eq. (2) with respect to time 



dH dT ^ a/s 
dt dt ~ dt ’ 



(4) 



and substituting the result into Eq. (1). 

The finite element method was used to solve numerically Eq. (3). As a re- 
sult of semi-discretisation, using the Bubnov-Galerkin method, the following 
equation was obtained 



MH + K{T)T=b{T), (5) 

where M is the mass matrix, K is the conductivity matrix, H is the enthalpy 
vector, T is the temperature vector and b is the right-hand side vector. This 
equation must be integrated over time. As the properties of the casting material 
depend on temperature, it is best to apply a time integration scheme that elimi- 
nates the necessity of finding the actual values of the material properties for the 
calculated temperatures iterativelly. The two-step Dupont II scheme can be ap- 
plied for this purpose [1]. However, the application of a two-step scheme requires 
the use of a one-step scheme, i.e. the modified Euler-backward scheme [6], in 
which the values of material properties are calculated on the basis of a known 
temperature. 

The final form of Eq. (5), after the application of the modified Euler- 
backward scheme, is as follows [4,5] 
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+ AtK^ 



' dT' 

m 



jyrn+l ^ 

-AtK^T^ + Atb'^+^ , 







while the application of the Dupont II scheme gives 



(6) 



/ 3 n 
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/ 3 n 


r dT] 


M+-AtK° 




Rn+2 ^ M+-AtK° 




V 4 „ 


dH 


/ 1 1 


dH 



1 n+1 ' 



H^+i_ 



--AtK°T^+^ - -AtK°T” 
4 4 



(7) 



-Atb 



n+2 ^ 



-Atb'^ 



The superscript (°) denotes that the thermal conductivity coefficient is cal- 
culated for an extrapolated temperature according to the equation 



T = 




( 8 ) 



The mass matrix does not contain any of material properties because this 
properties are placed in the enthalpy. The dT/dH matrix arises from the devel- 
opment of temperature function into a Taylor series, for the time level n -|- 1 in 
Eq. (6) and n-l- 2 in Eq. (7). It is a diagonal matrix with coefficients calculated 
for particular nodes of a finite element. This coefficients are calculated on the 
basis of equations obtained as a result of differentiating Eq. (2) with respect to 
temperature in the appropriate temperature intervals. For the interval, in which 
solidification takes place, one can obtain 






1 



cpf - 




Ts<T< Tl, 



(9) 



where Tl is the temperature of the begining of solidification (liquidus tempera- 
ture) and Tg is the temperature of the end of solidification. The application of 
the above expression in Eqs. (6) and (7) requires a knowledge of the relationship 
of the solid phase fraction to temperature. Moreover, it is possible to take the 
forming microstructure directly into account in the above formulation. 

From the solution of Eqs. (6) and (7) the enthalpies are obtained. These 
enthalpies are recalculated into temperatures on the basis of the functions educed 
from Eq. (2) for particular temperature ranges [5]. 



3 Solid Phase Growth Model 

The behaviour of metal alloys in terms of temperature and chemical constitution 
is presented with the help of phase diagrams (Fig. 1). The solidus temperature for 
the equilibrium solidification model is shown as Tg, and the solidus temperature 
for the indirect solidification model is shown as Tse. The possible solidification 
runs, between solidus and liquidus lines, are schematically shown for an alloy in 
which the solute concentration is equal to Cq. 
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In the case of the non-equilibrium solidification model the eutectic temper- 
ature, Te, is always reached by the solidifying alloy (line 1). This means that 
a certain last portion of the metal solidifies at a constant temperature. In the 
case of the equilibrium solidification model (line 2 ) the temperature of the end of 
solidification depends on the chemical composition of the alloy. For the indirect 
solidification model (line 3) the solidification run depends on the diffusion path 
length of the solute and so on the grain size in the solidifying microstructure. 

It is possible to obtain an analytical function which describes the relation- 
ship between the solid phase fraction and temperature for two-component metal 
alloys. This function can be obtained from the solution of the balance equations 
for the solute mass in a single grain. The balance of the solute mass for the 
indirect solidification model is as follows [7] 



mr] 






dt 






Ds „ 

m — Tj 



\t) 



dCs{r]{t),t) 






dCi 

dt 



mr] 



's 

\ty- 



dt 



Ci{t) = 0, 



( 10 ) 



where m is a coefficient which equals 1 for plane, 2 for cylindrical and 3 for 
spherical coordinate systems, C is the solute concentration, 77 is the current 
thickness or radius of the solidified part of the grain, rg is the final thickness or 
final grain radius, is the solute diffusion coefficient in the solid phase and ^ 
is the current coordinate. 




Fig. 1. The solid phase growth models in the two-component alloys (1 - non- 
equilibrium, 2 - equilibrium, 3 - indirect) 



The solution of Eq. (10), after introducing the term of so-called local so- 
lidification time ti and using the relationships received from the phase diagram 
(connecting the solute concentration with temperature), can be written as [4,5] 



U(T) 




Tm-T 



1 — nka 



Tm — TL 



( 11 ) 
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where n is a coefficient engaging the grain shape (n = 2 for plane grain, n = 4 
for cylindrical (columnar) grain and n = 6 for spherical grain) and k is the solute 
partition coefficient. The a coefficient is defined as 

( 12 ) 

g 

The application of Eq. (11) gives physically unrealistic results for a coefficient 
values above a certain limit value depending on the grain shape. This means that 
the solid phase fraction is equal to 1 for a temperatures higher than the solidus 
temperature. One can avoid this inconvenience by introducing an appropriate 
correction for the a value. In this paper the correction was introduced only for 
the plane grains, this means for n = 2. It equals [8] 

12(a) = a(^l-exp(-i))-lexp(^-i). (13) 

The coefficient a can accept any positive value after the application of the 
above correction, while the coefficient 12 can accept values from 0 to 0.5. The ap- 
plication of correction relies on the replacement of a coefficient with 17 coefficient 
in Eq. (11). 

Substituting a = 0 into Eq. (11) one can obtain the relationship of the 
solid phase fraction for the equilibrium solidification model, while for a = 1/n 
the relationship of the solid phase fraction for the non-equilibrium solidification 
model. 

4 Equiaxed Microstructure Modelling 

The extent of zones with different types of microstructure, as well as the charac- 
teristic dimensions of grains in those zones, depend on the degree of undercooling 
of the melt at the beginning of solidification. The undercooling depends on the 
velocity of carrying away heat from the casting. Directly taking into account 
the melt undercooling leads to many numerical difficulties in the solidification 
model. The assumption, that solidification starts at the liquidus temperature 
and that the undercooling quantity, represented by the cooling velocity, decides 
the characteristic dimensions of the created microstructure, is a much better 
solution. 

In the paper it was assumed that only equiaxed microstructure is formed in 
the casting. Then the final grain radius depends on the cooling velocity, i.e. in 
the following form [5] 

rg = r-b (l - exp (-1/7") ) , (14) 

here rb is the maximal grain radius in the calculated microstructure, while T 
is the average cooling velocity, calculated from the beginning of the cooling 
process till the liquidus temperature is reached. In Eq. (14) the maximal grain 
radius depends on the constitution of the casting alloy and should be established 
experimentally. 



Modelling of Equiaxed Microstructure Formation 707 



5 Example of Computer Simulation 

An example computer simulation was carried out for A1 ~ 2% Cu alloy, solidifying 
in a metal mould. This alloy was chosen because of its wide range of solidification 
temperatures (40 K). The following values of material properties were used in the 
calculation: ps = 2824 and pi = 2498 kg/m^, Cg = 1077 and c\ = 1275 J/kgK, 
As = 262 and A = 104 W/mK, L = 390000 J/kg and k = 0.125. The linear 
dependence of the thermal conductivity coefficient with respect to temperature 
was assumed in the range from liquidus temperature to the temperature of the 
end of solidification. Temperatures, needed to carry out the numerical simulation, 
were taken from a phase diagram for the A1 - Cu alloys. They are equal to: Tm = 
933 K, Tl = 262 K, Ts = 886 K and Te = 821 K. 




Fig. 2. The analysed casting in the mould 

In the calculation it was assumed that the maximal grain radius equals 
5 • 10“^ m. The initial casting temperature was 960 K, while the initial mould 
temperature was 590 K. The analysed casting together with the mould is shown 
in Fig. 2. The region was divided into 8609 triangular finite elements, receiv- 
ing 4659 nodes, with 5815 elements and 3060 nodes in the casting. The con- 
tinuity conditions were assumed for both the contact between the casting and 
mould, and two parts of the mould. The heat exchange coefficient through the 
layer, which separated the casting from the mould, was assumed to be equal to 
1000 W/m^K, while the heat exchange coefficient between two parts of mould 
was equal to 800 W jrc? K. The third type of boundary condition was estab- 
lished on the remaining boundaries. It was assumed that the ambient temper- 
ature equals 300 K, while the exchange coefficient with the environment equals 
100 W/m^K on the top and side-boundaries and 50 W/m^K on the bottom 
boundary. In the calculation of the a coefficient it was assumed that the Hgif 
product is equal to 6 • 10“® m^, while the coefficient engaging the grain shape 
equals 2. A time step equal to 0.05 s was applied. 
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Time [s] 

Fig. 3. The cooling curves in the chosen nodes of the casting 

The full solidification time equals 235 s. The diagrams showing the cooling 
curves of the chosen nodes of the finite element mesh displays the solidification 
course differences in the different casting regions (Fig. 3). The solidification 
proceeds rapidly in the nodes closest to the mould wall. The biggest grains are 
formed there (Fig. 4). There is a very wide range of the temperatures for the end 
of solidification, from 876 K (10 K lower than equilibrium solidus temperature) 
to eutectic temperature. The values of coefficient also varied widely, from 
0.0433 in the central regions to 0.4872 in the layers in contact with the metal 
mould. In turn, the cooling velocities vary from 0.78 to 29.88 K/s. Because the 
average radius is the cooling velocity function, there is considerable difference in 
the grain sizes occurring in the casting. The radii of the smallest grains equals 
17, 71 ^m, while the radii of the biggest ones equals 368.30 fim. 

6 Summary 

A new method of numerical modelling of equiaxed microstructure forma- 
tion in solidifying two-component alloys was presented in this paper. The 
above mentioned method is based on the so-called indirect solidification (solid 
phase growth) model. The indirect model, in contrast to commonly used non- 
equilibrium and equilibrium solidification models, makes it possible to take grain 
sizes into consideration in the calculation of temperature fields and solidification 
kinetics. The main advantage of the indirect solidification model is that the tem- 
peratures of the end of solidification, determined by this model, can cover the 
complete range from the equilibrium solidus temperature to eutectic tempera- 
ture. 
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Fig. 4. The distribution of the average radii of grains [/rm] 
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Abstract. A Dirichlet problem for a singularly perturbed parabolic 
reaction-diffusion equation is considered on a segment and, in partic- 
ular, in a composite domain. The solution of such a problem exhibits 
boundary and transition (in the case of the composite domain) parabolic 
layers. For this problem we study classical difference approximations on 
sequentially locally refined meshes. The correction of the discrete solu- 
tions is performed only on the subdomains subjected to rehnement (their 
boundaries pass through the grid nodes); uniform meshes are used in 
these adaptation subdomains. For a posteriori grid refinement we apply, 
as indicators, auxiliary functions majorizing the singular component of 
the solution. As was shown, in this class of the hnite difference schemes 
there exist no schemes which converge independently of the singular per- 
turbation parameter e (or e-uniformly). We construct special schemes, 
which allow us to obtain the approximations that converge ’’almost e- 
uniformly”, i.e., with an error weakly depending on e. 

1 Introduction 

For a wide variety of singular perturbation problems, special finite difference 
schemes which converge er-uniformly have been well developed and analyzed in 
the last years (see, for example, [1-4]). Usually such numerical methods re- 
quire a priori information about singularities of the solution and are somehow 
adapted (e.g., by a priori refinement of meshes). On the other hand, a posteriori 
technique is often used in computational practice for regular problems in order 
to improve the accuracy by local grid refinement in those (sufficiently small) 
subregions where the solution gradients are large (see, e.g., [5,6]). By such a 
way, no a priori knowledge about the solution is required. To compute the im- 
proved solution, this method uses uniform meshes that provides the efficiency 
of calculations. By this argument, it would be of significant interest to develop 
such techniques for representative classes of singular perturbation problems. The 
author can mention only [7], in which a similar approach was firstly applied. 

* This research was supported in part by the Russian Foundation for Basic Research 
under grant No. 98-01-00362 and by the NWO grant dossiernr. 047.008.007. 
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In the present paper we consider one approach how to increase the accuracy 
of numerical solutions for a parabolic singularly perturbed equation of reaction- 
diffusion type. We use standard finite difference approximations on locally refined 
grids. Note that, besides boundary layers, for e: — > 0 there appears a transition 
parabolic layer in the case of a composed domain. We apply two types of lo- 
cal grid refinement in regions of these singularities: either a priori, i.e., before 
all computations, or a posteriori , after certain computations and the analysis 
of intermediate solutions. These a posteriori methods, whose errors are weakly 
depending on the parameter s (in other words, weakly sensitive methods), are al- 
ternative to classical and special e-uniform a priori ones. Contrary to e-uniform 
methods, for which the use of meshes abruptly condensing in a parabolic bound- 
ary layer is necessary [1,8], weakly sensitive methods comprise only simple uni- 
form meshes. Contrary to classical schemes, which converge only if the mesh 
width is substantially less than the parameter e (that is very restrictive for the 
method), weakly sensitive schemes converge for even not too small values of e. 

To construct a posteriori condensing meshes, we use indicator functions 
which are majorants for the singular component of the solution; these func- 
tions obey parabolic singularly perturbed equations. In [7], we made use of 
functions which are solutions of ordinary singularly perturbed equations; such 
indicators are sufficiently rough in order to evaluate exactly the subdomain sub- 
ject to refinement. It should be noted that boundary value problems in composed 
domains, i.e., problems with transition layers, were not considered in [7]. 

2 Problem Formulation 

2 . 1 . In the domain G with boundary S = G\G, where 



we consider the following boundary value problem for the parabolic equation 



u{x,t) = (fi{x,t), (x,t) € S. 

Here a{x,t), c{x,t), p{x,t), f{x,t), (x,t) G G, (p{x,t), {x,t) G S are sufficiently 
smooth functions, and also oq < a{x,t) < aP , c{x,t) > 0, Po Pi p{x,t) < 
{x,t) G G, oq, po >0; e is the singular perturbation parameter, e G (0, 1]. 
Assume that f{x, t) and p{x, t) satisfy sufficient compatibility conditions on the 
set 7 o = X {t = 0}, Gh = D\ D, i.e., at the corner points (0, 0), {d, 0). 

We suppose that the boundary S consists of two parts, namely, S = SqUS^, 
where Sq is the lower base of the set G, Sq = Sq, is the lateral boundary. As 
e: ^ 0, parabolic boundary layers appear in a neighbourhood of . 

2 . 2 . Let us give a classical finite difference scheme for problem (2), (1) and 
discuss some difficulties arising in the numerical solution of this problem. On the 



G = £>x(0,T], D = {x:Q<x<d) 



( 1 ) 




(2) 
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set G, we introduce the rectangular grid 

Gh = oji X ujQ, ( 3 ) 

where uJi is a mesh on [0, d] with arbitrary distribution of its nodes satisfying only 
the condition^ h < MN~^, where h = max^ h®, h® = — a;*, S uJi; 

LOo is a uniform mesh on [0, T] with step-size ht = TNq^. Here A^-|- 1 and iVo -I- 1 
are the number of nodes in the meshes uJi and Uo ■ An especial attention will be 
paid to the meshes 

Gfi = where tUi is a piecewise uniform mesh. (4) 

To solve the problem, we use the implicit scheme [9] 

Az{x,t) = {e'^a{x,t)Sxx ~ c{x,t) - p(x,t)dj} z(x,t) = /(x,t), {x,t)GGh, 

( 5 ) 

z(x, t) = (p{x, t), (x, t) e Sh, 

where Gh = GC] Gh, Sh = S H Gh', 5^xz{x, t), Sjz{x, t) are the second and first 
difference derivatives, e.g., 6xxz{x,t) = 2(/i* -|- — 5x]z{x,t), x = xb 

We say that the numerical solution z{x, t) converges almost e-uniformly if 
for any arbitrarily small number i/ > 0 one can find a function N~^ ^ A"o"^) 

such that 



\u{x,t) — z{x,t)\ < MX{e ‘'N {x,t)GG, 

where 'z{x,t), (x,t) S G is the linear (with respect to x and t) interpolant 
constructed from z{x,t), (x,t) € Gh', X{N~^,Nq^) — > 0 for N, Nq —y oo e- 
uniformly. In other words, the difference scheme converges almost e-uniformly 
with defect v (for i/ = 0 the scheme converges e-uniformly). 

For the solution of scheme (5), (3) such an estimate is true: 

\u{x,t) - z{x,t)\ < M[{e + (x,t) e Gh- (6) 

In the case of the difference scheme (5) , (4) we have the estimate 

\u{x,t) - z{x,t)\ < M[{e + + Nq^], (x,t) £ Gh- (7) 

It follows from estimates (6) and (7) that the schemes under consideration con- 
verge if N~^X(N~^,Nff^)e or 

£-i=o(iV). (8) 

If this condition is violated, e.g., for e~^ = 0(N), the solutions of schemes (5), 
(3) and (5), (4), generally speaking, do not converge to the solution of problem 
(2). By this argument, there appears such a theoretical problem: to construct 



^ Here and below M denote sufficiently large positive constants independent of e and 
the discretization parameters. In what follows, the notation Gji(i.j) (d(i.j), 
indicates that these grids (operators, numbers) are first defined in equation (i.j). 
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special difference schemes whose errors do not depend on the parameter e. In 
particular, it is of interest to develop such schemes that converge under a weaker 
condition than condition (8). 

2 . 3 . In [1,8] the author introduced a special piecewise uniform mesh con- 
densing in the boundary layer, on which the scheme (5) converges e-uniformly 
with rate O (^N~^ In^N + On the grid (3) where ZUi is a Bakhvalov- type 

graded mesh from [10], the scheme converges with rate O (iV“^ -|- 

In several regular problems having local singularities, locally a priori or a 
posteriori refined meshes are used to improve the accuracy of numerical solu- 
tions [6]. A posteriori refined meshes are also attractive to be applied to singu- 
larly perturbed problems, in particular, to problem (2), (1). We consider some 
algorithms of local grid refinement and study their applicability to the construc- 
tion of approximate solutions with an error depending weakly on the parameter e. 



3 On £-Uniformly Convergent Difference Schemes 



Let us describe one base algorithm of constructing a locally (in the boundary 
layer region) refined mesh and show some relevant issues. To construct grids on 
the subdomains subject to refinement, we use uniform meshes in space and time. 
On the set G we introduce the uniform rectangular grid 

Gih = LUiX uJq, (la) 

where oJi is a uniform mesh with step-size h = dN~^, uJq = For conve- 

nience we denote the solution of (5) on the grid (la) by zi(x,t), (x,t) G Gih- 
Let two values d\ and d\ have been found by some way, dj , df G wi , d\ < d\ 
such that for d\ <x < d\ the grid solution z\{x, t), (x, t) G Gih is a satisfactory 
numerical approximation to the solution u(x, t) of problem (1), (2). If it appears 
that d\ > 0, dj < d, then we define the subdomains G^^ = (0, d\) x (0, T], G^ 2 ) = 
(d?,d)x(0,T]. 

On the subsets G( 2 ) we introduce the grids G^ 2 )h = ^( 2 ) ^ = ^>2, 

where ^( 2 ) uniform meshes each with the number of nodes -|- 1. 

Let G C( 2 )/j be the solution of the grid problem 



= f{x,t), (x,t) G Gy^h, 






Zi{x,t), (x,t) G S^ 2 )h\^^ 
ip{x,t), {x,t) G S^ 2 )hF S, 



*= 1 , 2 , 



with G(2)^- G^2) n G(2)/,, - S'(*2) n Gf^2)h^ <S'(2) - C(2) \ G^y Then 

we define the grid G 2 h and the function Z2{x,t), {x,t) G G 2 h by the relations: 
G 2 h = G( 2 )?i U G( 2 )/i U {Gih \ {G{ 2 ) U C( 2 )}}, and 



zyyx,t), (x,t) G G(2)/,, *=1,2, 

Zl{x,t), (x, t) G Gi/j \ |G( 2 ) U G( 2 )}; {x,t) G G 2 h- 
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Let the grid Gk-i,h and the function Zk-i{x,t) on Gk-i,h have been already 
constructed for k > 3, and assume, similarly to what has been said above, that 
the grid solution Zk-i{x,t), (x,t) € Gk-i,h gives a satisfactory approximation 
to u{x,t) for dl_^ < X < If > 0, df._^ < d, we define the domains 



Gjfc) = (0,4-i)x(0,r], G^,) = (dti,d)x(0,T]. (lb) 

On the sets we introduce the grids 



^{k)h — ^\k) X ^ 0 ) * — 1;2, (Ic) 

where ^^k) ^(fe) uniform meshes each with the number of nodes + 1. 

Let (x,t) G be the solution of the grid problem 



= f{x,t), (x,t) e 

Zk-i(x,t), (a;,t) e 

<p(x,t), (x, t) e n 5, f = l,2. 



(Id) 



z\k){x,t) = 



Suppose Gkh — G(^k)h U G(fc)/t U {Gk~i,h \ {G(fc) U G(^,)}}, 



[ Zk-i{x, t), {x, t) G Gk-i,h \ {G(fe) U G(j,)}. 

If for some values i = j and k = Kq^J), j = 1, 2 it turned out that <^^^(1) = 0 
(or = d), then we suppose = 0 for fc > i^ro(l) (dfc = d for fc > Kq{2) 

respectively); let Kq = max[iGo(l), A"o(2)]. For k > Ko{j) + 1 the sets G^(^k) 
are assumed to be empty, and we do not compute the functions z^^^(x,t). For 

example, for k > Kq we have Zk(x,t) = ZKo(x,t), Gkh = GkoH- 
For k = AT, where AT > 1 is the given fixed number, we set 

z^{x,t) = ZK{x,t) = z{x,t), (x,t)GG/i, G^ = GKh = Gh- (le) 

We call the function z^i~^{x,t), (x,t) G the solution of scheme (5), 

(1). The given algorithm allows us to construct meshes condensing in the 
boundary layers. The number of nodes Nk + 1 in the mesh uJk generating Gkh 
does not exceed (2AT— 1)(A^+1). Thus, the grid Gxh belongs to the family G^( 4 j. 
Note that the solution of intermediate problems (Id) requires no interpolation 
to define the functions z(^j(x,t) on the boundary 

The grids Gkh, fc = 1, ..., Lf generated by the algorithm are defined by 
the way of choosing the values d\, f = l,2, fc=l,...,A' — 1, and also by the 
values K and N,Nq. Thus, this algorithm determines the class of finite 
difference schemes (5), (1). In this class the boundary of the subdomain subject 
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to refinement passes through the nodes of a coarser grid. Note that the small- 
est step of the mesh lJk is not less than dN~^ . The meshes generated by the 
algorithm in which the values d). and K are defined before the start of com- 
putations (or in the course of calculations, by relying on intermediate results), 
belong to a priori (a posteriori) refined meshes. 

The schemes from the class (5), (1) satisfy the maximum principle [9]. The 
following theorem states the ’’negative” result mentioned in the abstract. 

Theorem 1. In the class of difference scheme (5), (7) for the boundary value 
problem (^), (!) there exist no schemes that converge s-uniformly. 

Remark 1. The statement of Theorem 1 remains valid when monotone finite 
element or finite volume operators are used to approximate the operator L(2)- 

4 Schemes on a Priori Condensing Meshes 

In this section we construct finite difference schemes from the class (5), (1) by 
prescribing principles how to choose the values d\., K. 

Let K >1. We define the values 

d\ = ak, dl = d-ak, ak = (7k{N) = , fc = l,...,A", i = l,2, (1) 

where A is an arbitrary number from (0,1). Then we get the following estimate 
for the components Zk{x,t) of the solution of scheme (5), (1), (1): 

\u{x, t) - Zk (x, t)\<Mk [£-2jV-2(l+(fc-l)A) + JV-2+2M + , (x, t) G Gkh , 

k=l,...,K, (2) 

where /r is any number from the interval (A, 1). Note that ZK(x,f) = z{x,f). 

On the set Gkh the fc-th component Zk{x,t) converges to the exact solution 
u{x,t) if such a condition is satisfied: 

£-1 = o(iVi+('=-i)^), k = l,...,K, X = \iy (3) 

Thus, for K >2 the solution of the scheme and its components Zfc(x, f) converge, 
respectively, on Gh and Gkh (for k >2) under the condition weaker than (8). 
But if for some k the parameter e satisfies the condition 

£G[£fc,l], ek = ek{N)=MN-^^, k = l,...,K, (4) 

where j3 is an arbitrary number from the interval (0,p], then for the compo- 
nent Zk{x,f), (x,t) G Gkh the following estimate is valid: 

\u{x,t) - Zk{x,f)\< Mk[N~'^^'^^^ + {x,t)GGkh, fc = l,...,AA (5) 

For sufficiently large AT satisfying the condition 

-ff > AT(0)(i/, A), A'(0)(i/, A) = 1 -fi A"V"^(1 - i^), A = A(X), (6) 

where i/ > 0 is an arbitrarily small number, the difference scheme (5), (1), (1) 
converges almost er-uniformly with defect v 

|w(x,t)-z(x,t)| <MiG[(e-‘'At-i)2/" + iV-2+2A‘ + Ar-i], (x,t) €Gh. (7) 
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Theorem 2. Let the solution u{x, t) of the boundary value problem (2), (!) and 
its regular part U{x,t) satisfy the inelusions u S a > 0, l>6 

and U € a > 0, / > 4. Then (i) the solution of seheme (5), 

and its components, viz. the functions z{x,t), (x,t) € Gh and Zk{x,t), 
{x,t)GGkh, k = l,...,K converge to the solution u{x,t) of problem (2) , (1) under 
condition {3); (ii) scheme (5), (1), {!), (6) converges almost e-uniformly with 
defect V. The discrete solutions satisfy estimates (2), (7) and, besides, (5), if 
condition (4) is fulfilled. 



5 Schemes on a Posteriori Condensing Meshes 



To construct schemes on a posteriori condensing meshes, we apply the algo- 
rithm ^( 1 ), where we use, as indicators for computing auxiliary grid functions 
which majorize the singular component of the problem solution. 

First we decompose the solution u{x,t) into regular and singular parts: 
u{x,t) = U{x,t) + V{x,t), (x,t) € G. We now estimate the function V{x,t). 

Let the function Uo{x, t) be the solution of the two ordinary differential equa- 
tions (on each of the sides Sjf, = Sf U S^)- 

{-P{x,t)^~ c{x,t)}Uo{x,t) = f{x,t), {x,t)GS^, 

UQ{x,f) = ip{x,t), {x,t) G Sq. 

Then the boundary-layer function V (x, t) is the solution of the problem 
L^ 2 )^(x,t) = 0, {x,t)GG, 

V{x,t) = <p^{x,t), (x,t) G S^, V{x,t) = 0, (x,t) G So, 



where (p^{x,t) = p{x,t) — Uo{x,t), (x,t) G S^. 

We represent the function ip^{x,t) as a sum of two functions (p^{x,t) = 

p^+{x,t) + p^-{x,t), {x,t)GS^, where t) > 0, (x,t) < 0, 

{x,t)GS^, fco = 0,l,2. By z=*=(x, t) we denote the solution of the problem 

= {e‘^6^^ - pSj} z^{x,t) = 0, {x,t) G Gh, 

z^{x,t) = g)^^{x,t), {x,t)GSf^, z^{x,t) = 0, (x,t) G Soh- 

where p = min^ [a~^{x,t)p{x,t)] . The grid functions z^{x,t) and z' (x,t), 
(x,t) G majorize the solution of problem (lb): 

z~(x,t) — < V{x,t) < z+(x,t) -I- MNq^, {x,t) G Gih. 

We define the values by 



dl = crl, dl = d-al, k=l,...,K. 



(3a) 
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Let us determine cr^. Assume e < MN ^ j3 = j3^j^y Let cr^*, k = 
f = 1,2 be the minimal value of cr* for which the following inequality holds: 

z+\x,t)-zrix.t)<Mk[N-^+^^ + N^^], {x,t)GGl,^^, 1 = 1,2, 

< X < d — (3b) 

Here = Gih, zf'^(x,t) = z"^^{x,t), (x,t) € Gih, the functions z^^{x,t), 

{x, t) € G(fc)?u k = 2, ..., K, i = 1,2 are the solutions of the problems 

= ^ SQ(k)h^ k = 2,...,K, i = l,2. 

If for some index i = j the inequality (3b) is false for any value , we suppose 
ai* = 2~^d. In (3b) /r = /i(2)- For e > MN~^^ we take tr^* = af* = 0. Finally, 
we define a\ = min [ cr^* , ]. 

The solution of difference scheme (5), (1), (3) satisfies the estimate 

\u{x,t) - Zk{x,t)\ < Mk[N-‘^+‘^^^ + Nq^], {x,t)£Gkh, 

for al<x<d-al, k=l,...,K, ^ = y,^ 2 y 

For sufficiently large N and sufficiently small e we have the inequalities 

<(3)’ <*(3) ^ d = d-(2y k=l,...,K, i = l,2. (5) 

Taking (5) into account, for the functions Zk{x,t), (x,t) G Gkh we obtain 

|u(x,t)-Zfe(x,t)| <Mfc[£-2fV-2F+('=-i)^) + iV-2+2/^ + iVo-i], {x,t)eGkh, 

k = l,...,K, fi = fj.^2)y (6) 

The component Zk{x,t), (x,t) G Gkh converges to the solution u{x,t) of the 
boundary value problem under condition (3). If condition (4) holds, we have 

\u{x,t) - Zk{x,t)\< Mk[N~‘^+'^^^ + N^^], {x,t)GGkh, (7) 

7‘ = M(2)> fc = l,...,AT. 

It follows from estimates (6) that the solution of scheme (5), (1), (3), (6) 
converges almost e-uniformly with defect v (below K = K(Q'j) 

\u{x,t)-z{x,t)\<MK[{e-'N-^f/'' + N-^+^>^ + NQ^], (x,t) eGh- (8) 

Theorem 3. Let the hypothesis ot Theorem 2 he fulfilled. Then (i) the functions 
z{x,t), {x,t) G Gh and Zk{x,t), (x,t) G Gkh, k = i.e., the solution of 

scheme (5), (1), (5) and its components, converge to the solution of problem 
{2), (1) under condition (5); (ii) scheme (5), (7), (5), (6) for p = p.^2) converges 
almost e-uiformly with defect v. The discrete solutions satisfy estimates {4), (^)> 
(8) and, besides, (7), if condition (4) is fulfilled. 
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Remark 2. For sufficiently large N and sufficiently small e the upper bound (5) 
is fulfilled. Thus, the schemes on a posteriori refined meshes defined by (3) are 
more effective than the schemes on a priori refined meshes defined by (1). Note 
that the use of indicator functions obeying singularly perturbed ODEs (see, 
e.g., [7]) substantially overstates the values that reduces the efficiency of the 
numerical method. 



6 Boundary Value Problem with a Transition Layer 



6 . 1 . In the composed domain G, where 

G = GluG^ G’' = £>’' X (0,T], D^ = {-d,Q), D^ = {Q,d), (1) 



it is required to find the solution of the problem 






d 



Lu{x,t) = < c{x,t) - p{x,t)— > u{x,t) = {x,t) € G, 



dx 



dt 



(2) 



[m(x)]= a{x,t) — u{x,t) =0, {x,t)€S*, u{x,t) = ip{x,t), (x,t) G S. 



Here S'* = { x = 0 } x (0, T], S = G\ { G U S* }, a{x, t) = ar{x, t), . . . , f{x, t) = 
/r(x,t), (x,t) G , r = l,2, and also 0 < oq < a{x,t) < a°, 0 < c(x,t) < c°, 
0 < Po < p{x,t) < (x,t) G G. The coefficients and the data of the problem 

are assumed to be sufficiently smooth. The symbol [u(x, t) ] denotes the jump of 
the function u(x, t) when crossing S*: [u{x,t)] = lim u{x 2 ,t) — lim u{xi,t), 

X<2 — X\ — *X 



a{x,t) —u{x,t) 



lim a 2 (x 2 ,t) — u(x 2 ,t) — lim ai(xi,t) — u(xi,t), 
X2^x Ox xi^x Ox 



Xr G O'", r = 1,2, (x,t) G S* . 



We consider that the compatibility conditions are fulfilled on the sets 70 = 

{(— d, 0)U(0, d)} and 7* = {(0, 0)} to ensure sufficient smoothness of the solution 
1 2 

u{x, t) on the subsets G , G for each e. 

As £: — > 0, parabolic boundary and transition layers appear in a neighbour- 
hood of the sets and S* respectively. 

6.2. On the set G, we introduce the grid 

Gh = oji X loq, (3) 

where lJq = ^o(3)’ ^ mesh on [— d, d] with N +l nodes. We denote the node 

X = 0 G u>i by . On the grid (3) we construct the difference scheme 



Az{x,t) = f{x,t), {x,t)GGh, z{x,t) = (f{x,t), (x,t) G Sh- (4) 
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Here 

A = e^a{x,t)6^s ~ c{x,t) -p{x,t)Sj, (x,t) €Gh\S*, 

A = e^2{jA‘° + ^{ 02 ( 0 ;, t)5x — ai{x, t)Sx}— c(x, t) — p{x, t)5j, (x, t) G 5'j^; 

J/(2)(a^>^)^ {x,t)eGh\S*, 

’~\f{x,t), (x,t)eSl 

We designate v{x,t) = (/i*“ + ^ (^h^°V 2 {x,t) — t)) , x = x*“=0. 
The difference scheme (4), (3) is £-uniformly monotone. 

In the case of the grids 



Gih — uji X Wo, 



(5a) 



uniform in x (or piecewise uniform with a finite number of intervals where the 
step-size is constant) we have the estimate 

I u{x,t) - z{x,t) I < M[(e -I- (x,f) G Gm, 



i.e., the scheme converges under condition (8). 

6.3. Similarly to scheme (5), (1), we construct the scheme on locally refined 
(in the boundary and transition layers) meshes replacing problem (5), (la) by 



problem (4), (5a), and the domains Gkh and also the values d], respectively 

by 



'^(k)h^ ^ 



kh 



and d, 



±Z 



(5b) 



where G(^f.)h — G'(fe)/i(l), G^h — d^^ — d^^iy G(^k)h^ G^h and 

constructed by the same way. The grids 



df. * are 



Gkh — Gf^f^U Gj^f^, fc — 1,...,AT, 



(5c) 



obtained by the algorithm ^( 5 p are determined by the choice of the values 

dl, f = l,...,4, k = l,...,K-l, (5d) 

where d^ = for i = 1,2, = dj^^~‘^ for i = 3,4. The solution of the 

difference scheme, viz. the function z{x,t), (x,t) G Gh, where Gh = Gf^ = Gkh, 
is defined by (le). 

In the case of a priori condensing meshes we define d^^g^ by 

dl = -d + ak, dfe = -cTfc, dl = <Tk, dl = d-Uk, ak = cJk{l)W- (b) 

Theorem 4. Let the solution u{x,t) of the boundary value problem (^), (7) and 
its regular part U{x,t) satisfy the inclusions r=l,2, a> 

0, Z>6 and t/G 6'^“'"“’ r=l,2, a>0, Z>4. Then (i) the functions 
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z{x,t), (x,t) € Gh and Zk(x,t), (x,t) € Gkh, k = i.e., the solution 

of scheme {4)> (5), (6) and its components, converge to the solution u{x,t) of 
problem (2), (!) under condition (5); (ii) scheme {4)> (5), (6), (6) converges 
almost e-uniformly with defect v. The discrete solutions satisfy estimates {2), 
(7) and, besides, (5), if condition (4) is fulfilled. 



6.4. Let us consider a scheme on a posteriori condensing meshes. 

The solution of problem (2), (1) can be represented as a sum: u{x,t) = 
U{x,t) + V{x,t), (x,t) £ G , r = 1, 2. Let us estimate the function F(x, t). 

Let the function Uo{x,t), {x,t) S U S*^, r = 1,2 be the solution of the 
fourth ODEs (on each part 5'^’’ forming S^, U 5'^^, and on each 

side S'*’’ of the interface boundary S*, S* = S*^ U S*"^): 



{-p{x,t)^ - c{x,t)}Uo{x,t) = f{x,t), (x,t) G S^U 5*^ U S*'^, 

Uq{x, t) = if{x, t), (x, t) G { u s*^ u s*^ } n So- 

Then V(x,t) is the solution of the problem 



L(2)y{x,t) = 0 , 
V{x,t) = I 



(x,t) eG^ 

(fi^^{x,t), {x,t)GS^^, 

(p*^{x,t), {x,t)GS*^, 



V (x, t) = 0, (x,t) G Sg, r=l, 2, 



(7a) 



(7b) 



where (p^'^{x,t) = if{x,t) — Uo{x,t), {x,t) G (p*'^{x,t) = u{x,t) — Uo{x,f), 

(x, t) G S*^. Assume (p*(x, t) = C/q(x + 0, t) — Uq{x — 0, t), (x, t) G S*. 

Further we decompose the functions (p^''{x,f) and (p*{x,t) as follows: 



ip^^{x,t) = (f^^+{x,t) + (x,t), {x,t)GS^^, r=l,2, 

ip*{x,t) = ¥5*+(x,t) + if*~{x,t), (x,t) G S*, 

where 



^ko f^ko 

:^^’'+(x,t) > 0, ■^ip^’'~{x,t) <0, (x,t) G 



dt^° 



r = l,2; 



^^(/j*+(x,t) > 0, ^^:^*“(x,t) < 0, {x,t)GS*\ fco = 0,l,2. 

Note that the functions considered on the set S'*’’ are limiting on S* from G'’. 
By z^{x,t) we denote the solution of the problem 

^(2) (x,t)GGl, 

z^{x,t) = T{x,t), (x, t) G U 

z^{x,t) = 0, (x,t) G Sok, r = l,2. 



(8) 
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where 

^{x,t) = < (p*^{x,t), {x,t)GS*^, 

The functions z^{x,t), z~{x,t), {x,t) G Gf^ majorize the solution of problem 
(7b): 

z~{x,t)-M{e-NQ^) < V{x,t) < z~^{x,t)+M{e+Ng^), (x,t) G G^f^, r = l,2, 

where Glf^ = G'' H We use the functions z+(x,t), z~(x,t) as indicators. 

We now choose the control parameters by such a way: 

dl = -d+al, dl = -al, dl = al, dl = d-at, k = l,...,K. (9a) 

Let us determine cr^. Assume e < p = Let 

i = i(r), i = 1,2 for r = 1, and i = 3,4 for r = 2, be the minimal value of cr* 
for which the following inequality is fulfilled: 

z+\x,t) - z-p\x,t) < Mk[N~^+>^ + N~^], {x,t) gGII)^^, i = i{r), 

for — d + a\<x < —a1, <tI < x < d — . (9b) 

Here = G[^, z^\x,t) = z^^(x,t), (x,t) G G^f^, r = 1,2, the func- 
tions z^^(x,t), (x,t) G k = 2,...,K, i = i(r) are the solutions of the 

problems 

^(2) t) = t) e z(x, t) = z^l^ix, t), {x, t) G slk)h, 

z{x,t) = 0, (x,t) e k = 2,...,K, i = i{r), r = 1,2. 

If for some index i = j the inequality (9b) is false for any value , we suppose 
= 0, f = i(r). We take al = min[cr”, cr^(t) ] . 

We come to the following estimate for the solution of scheme (4), (5), (9): 

I u{x,t) - Zk{x,t) I < _|_ iVp-^], {x,t) G Gkh, 

k=l,...,K, ^ = /T(2), A = A(i). (10) 

Thus, the component Zk{x,t), (x,t) G Gkh converges to the solution of the 
boundary value problem under condition (3). Under condition (4) we have 

\u{x,t) - Zk{x,t)\ < Mk[N~^~^>^ + Nq^], {x,t)GGkh, k = l,...,K. (11) 

For the solution of scheme (4), (5), (9), (6) we obtain the estimate 

\u{x,t) - Zk{x,t) \ < MK[{e-^N~^y^'' + + N^^], (x,t) GGh, 

K=K^Qy 



( 12 ) 



722 Grigorii I. Shishkin 



Theorem 5. Let the hypothesis of Theorem 4 be fulfilled. Then (i) the funetions 
z{x,t), (x,t) G Gh and Zk{x,t), (x,t) € Gkh, k = 1, . . . , K , i.e., the solution of 
seheme (^), (5), (9) and its components, converge to the solution u{x,t) of the 
boundary value problem {2), (1) under condition [3); (ii) scheme (4), (5), (9), (6) 
for /i = Pf^2) converges almost e-uniformly with defect v. The discrete solutions 
satisfy estimates {10), {12) and, besides, {11), if condition {4) is fulfilled. 

Remark 3. For sufficiently large N and sufficiently small e the upper bound 
holds for t = 1, . . . , 4, k=l,...,K. 

Remark 4 . Let us assume that the function ip*{x,t) satisfies the condition: 
max t)| > me. In this case we apply scheme (4), (5), (9), where we re- 
place by in the right-hand side of (9b). Then estimates like (4), 

(6) and (8) are valid for the approximate solutions. 
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Abstract. We consider grid approximations of a boundary value prob- 
lem for the boundary layer equations modeling flow along a fiat plate in 
a region excluding a neighbourhood of the leading edge. The problem is 
singularly perturbed with the perturbation parameter e = 1/Re multi- 
plying the highest derivative. Here the parameter e takes any values from 
the half-interval (0,1], and Re is the Reynolds number. It would be of 
interest to construct an Re-uniform numerical method using the simplest 
grids, i.e., uniform rectangular grids, that could provide effective compu- 
tational methods. To this end, we are free to use any technique even up 
to fitted operator methods, however, with fitting factors independent of 
the problem solution. We show that for the Prandtl problem, even in the 
case when its solution is self-similar, there does not exist a fitted operator 
method that converges Re-uniformly. Thus, combining a htted operator 
and uniform meshes, we do not succeed in achieving Re-uniform conver- 
gence. Therefore, the use of the fitted mesh technique, based on meshes 
condensing in a parabolic boundary layer, is a necessity in constructing 
Re-uniform numerical methods for the above class of flow problems. 



1 Introduction 

The boundary layer equations for laminar flow are a suitable model for Navier- 
Stokes equations with large Reynolds numbers Re. Boundary value problems 
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for these nonlinear equations are singularly perturbed, with the perturbation 
parameter e defined by e = Re~^. The presence of parabolic boundary layers, 
i.e., layers described by parabolic equations, is typical for such problems [1,2]. 

The application of numerical methods, developed for regular boundary value 
problems (see, for example, [3,11]), even to linear singularly perturbed problems 
yields errors which essentially depend on the perturbation parameter e. For small 
values of e, the errors in such numerical methods may be comparable to, or even 
much larger than the exact solution. This behaviour of the approximate solutions 
creates the need to develop numerical methods with errors that are independent 
of the perturbation parameter e, i.e. £-uniform methods. 

The presence of a nonlinearity in the differential equations makes it consid- 
erably more difficult to construct e-uniform numerical methods. For example, 
even in the case of ordinary differential quasilinear equations, e-uniform fitted 
operator methods (see, e.g., [5,6]) do not exist. It should be pointed out that 
even for linear singularly perturbed problems with parabolic boundary layers 
there are no e-uniform fitted schemes (see, for example, [7-9]). Thus, the devel- 
opment of special e-uniform numerical methods for resolving the Navier-Stokes 
and boundary layer equations has considerable scientific and practical interest. 

In this paper, we consider grid approximations of a boundary value problem 
for boundary layer equations for a fiat plate on a bounded domain outside a 
neighbourhood of its leading edge. The solution of the Prandtl problem is self- 
similar and exhibits a parabolic boundary layer in the considered domain. We 
study a wide class of discrete approximations consistent with the differential 
equations, i.e., the coefficients in the finite difference operators related to the 
differential coefficients do not depend on the problem solution. It is shown that 
the use of special meshes condensing in the boundary layer region is necessary. 
Also, no technique for the construction of the discrete equations leads to an 
e-uniform method, unless it uses condensing grids. 

2 Problem Formulation 

Let us formulate a boundary value problem for Prandtl’s boundary layer equa- 
tions on a bounded domain. Consider a fiat semi-infinite plate in the place of 
the semiaxis P = {{x,y) : a; > 0, y = 0}. The problem is considered to be 
symmetric with respect to the plane y = 0; we discuss the steady flow of an 
incompressible fluid on both sides of P, which is laminar and parallel to the 
plate (no separation occurs on the plate). 

As is well known, singularities in such a problem arise for a large Re num- 
ber. A typical singularity is the appearance of a parabolic boundary layer in a 
neighbourhood of the fiat plate outside some neighbourhood of its leading edge. 
In a neighbourhood of the leading edge, another type of singularity is generated 
because the compatability conditions are violated at the leading edge. In order 
to concentrate on the boundary layer region under consideration, we skip a small 
neighbourhood of the leading edge. 
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We consider the solution of the problem on the bounded set 

G, where G = {{x,y) : x S (di, ^2], 2/ G (0, do)} , di > 0 (1) 

with the boundary S = G \ G. Let = {(x, y) : x G [di,d2], y G (0,do]|; 
G° = G; and let 5”° = G \ G° be the boundary of the set G°. Assume S = U5'j, 
j = 0,1,2, where Sq = {(x,y) : x G [di,d2], y = 0}, 5i = {(x, y) : x = 
di, y G (0,do]|, S 2 = {(x,y) : x G (di,d^, y = do}, 5'o = Sq. Thus, the 

boundary = Sq belongs to P. On the set G, it is required to find a function 
f/(x, y) = (u(x, y), t;(x, y)) which is the solution of the following Prandtl problem: 

52 g 

{U (x, y)) = e-^u{x, y) - u{x, y) ^m(x, y) - 



-v{x, y) -^u{x, y) = 0, (x, y) G G, (2a) 

L^U{x,y) = -^u{x,y) + -^v{x,y) = 0, (x,y)GG°, (2b) 

w(x,y) = (fi{x,y), (x,y) G S', (2c) 

v{x,y) = ilj{x,y), (x,y)GS°. (2d) 

Here e = Re~^; the parameter e takes arbitrary values in the half-interval (0,1]. 
We now wish to define the functions f{x, y) and V’(x, y) more precisely. 

In the quarter plane 

17, where 17 = {(x, y) : x, y > 0} (3) 



we consider the following Prandtl problem which has a self-similar solution [1]: 
(C/(x,y)) = 0, (x,y)Gl7, L'^U{x,y) = 0, {x,y)Gf2\P, 

(4 

■u(x, y) = Moo, x = 0, y>0, C/(x, y) = (0, 0), (x,y) e P. 



The solution of problem (4), (3) can be written in terms of some function /(y) 
and its derivative 

w(x,y) = ■Uoo/'(y), z;(x,y) = (2"1 mooX"^)^^^ (y/'(y) - /(y)) (5) 

where y = (2“^Uoo y. The function /(y) is the solution of the 

Blasius problem 

= /'"(d) + /(d)/"(d) = 0, d e (0, 00), 

/(O) = f(0) = 0, hm /'(y) = 1. 

*-oo 

The functions yj(x, y), ip{x,y) are defined by ^ 

V?(x,y) = U(5)(x,y), (x,y)GS; V'(a^, d) = ^’(5)(a^, y), (x,y)GS°; (7) 

note that ip{x,y) = 0, tp{x,y) = 0, (x,y) G S°. 



^ Here and below the notation indicates that w is first defined in equation (j.k). 
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In the case of problem (2), (7), (1), as e tends to zero, a parabolic boundary 
layer appears in a neighbourhood of the set . 

To solve problem (2), (7), (1) numerically, we wish to construct an er-uniform 
finite difference scheme. 



3 Classical Difference Scheme for the Prandtl Problem 

For the boundary value problem (2), (7), (1) we use a classical finite difference 
scheme. At first we introduce the rectangular grid on the set G: 

Gh = X Zj2 (1) 

where tUi and u }2 are meshes on the segments [^ 1 ,^ 2 ] and [0,do], respectively; 

uji = {x^ \ t = 0, ...,iVi, = di, = ^ 2 }, W2 = {j/-’ : j = 0, ..., iV2, ?/° = 

0; 2/^^ = <^o}; iVi + 1 and A ^2 + 1 are the number of nodes in the meshes lo\ 
and o 72. Define h\ = — x*, x*, x*+^ € oJi, h{ = , y^ , y^^^ € Z 02 , 

hi = maxi h\, /12 = maxj / 12 , h = max [ hi, /12 ]. We assume that ^ h < MN~^, 
where N = min [ iVi , A ^2 ] ■ 

We approximate the boundary value problem by the difference scheme 

{U^{x,y)) = e5y^u'^{x,y) - u'^{x,y)S^u'^{x,y) - 

-v'"{x,y)5yu'"{x,y) = 0, (x,y) € Gh, (2a) 

Alu'^{x,y) = S^u^{x,y) +6yV^{x,y) =0, (x,2/)gG°, x > di, 

(2b) 

AlU^ix, y) = 4 u^{.x, y) + 5yV^{x, y) = 0, (x, y) G 51?^; 

u'‘(x, y) = </5(x, y), (x, y) G Sh, (2c) 

v^i.x,y) = i){x,y), {x,y) G 3°- (2d) 

Here Syg z{x, y) and Sx z(x, y ), ..., 5yz{x, y) are the second and first (forward 
and backward) difference derivatives (the bar denotes the backward difference), 
as follows: 5ygz{x,y) = 2{h^^ + h{)~^ {5y z{x,y) - 5y z{x,y)), 5xz{x,y) = 
(h\)-^ {z{x^+\,y) - z{x,y)) , Syz{x,y) = (h^"^)"i (z(x, j/) - z(x, y-’"^)), 
(x,y) = {x\y^). 

The difference scheme (2), (1) approximates problem (2), (1) with the first 
order of accuracy for a fixed value of the parameter e. 

When the ’’coefficients” multiplying the differences Sx and Sy in the operator 
A^ are known (let these be the functions Mq(x, y) and Wq(x, y)), and if they 
satisfy the condition Uq(x, y), VQ{x,y) > 0, (x,y) G Gh, we know that the 

operator is monotone [4]. In such a case we say that the discrete momentum 
equation (2a) is monotone. 

It is known that even for linear singularly perturbed problems, if we consider 
the heat equation with y and x being the space and time variables, the errors of 

^ Throughout this paper M (m) denote sufficiently large (small) positive constants 
independent of e and the discretization parameters. 
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the discrete solution depend on the perturbation parameter and become com- 
parable with the exact solution itself when the value has the same order of 
magnitude as the mesh step-size /12 on uniform grids. Therefore, it is not surpris- 
ing that for the solution of the difference scheme (2), (1) we have the following 
lower bounds 

rnux \u{x, y) — u^{x, y)\ > m, imx \v*{x, y) — v*^{x, y)\ > m. 

Gh Gh 

Here v*{x,y) = u(a:, y), v*^{x,y) = v^{x,y). We call the func- 

tions u{x,y), u^{x,y) and v*{x,y), v*^{x,y) the normalized components of 
the solutions of problems (2), (7), (1) and (2), (1). Note that the functions 
u{x, y), v*{x, y) are e-uniformly bounded on G and have order of unity, whereas 
the function v{x, y) tends to zero when e — > 0. 

Thus, our aim is to try to find a numerical method which yields the discrete 
solutions vP^{x,y), v^*^{x,y) satisfying the error estimates 

\u{x,y)-u^\x,y)\ < M ] , (3) 

|u*(x, y) - v^*\x, y)\<M[ ] , (x, y) e G° (4) 

where is some grid on G, and y), y), (x, y) G G° is the solution of 

some discrete problem on Gj^ , and V 2 are any positive £-independent numbers. 
Throughout this paper M (m) denote sufficiently large (small) positive constants 
which do not depend on e and on the discretization parameters. 

We say that the method is e-uniform if the errors in the discrete normal- 
ized solutions are independent of the parameter e. By a robust layer-resolving 
method we mean a numerical method that generates approximate normalized 
solutions that are globally defined, pointwise-accurate and parameter-uniformly 
convergent at each point of the domain, including the boundary layers. Thus, 
the errors of these normalized solutions in the Too-norm are independent of e, 
they depend only on and N 2 and tend to zero when N\, N 2 ^ 00 . 

Here we try to find a robust layer-resolving numerical method for the partic- 
ular problem (2), (7), (1). It would be attractive to find a method using uniform 
grids. Obviously the method (2), (1) is too simple to be a robust layer-resolving 
method. 

4 On Fitted Operator Schemes for the Prandtl Problem 

In order to have freedom as much as possibly in the construction of an e-uniform 
method, we use a fitted operator method. Such a method admits any technique 
to be used to construct discrete approximations to the solution of problem (2), 

(7), (!)• 

Before we proceed, we make a few remarks. 

As was shown in [8,10] (see also [7,9,11]) for a singularly perturbed parabolic 
equation with parabolic boundary layers, there exist no fitted operator schemes 
on uniform meshes that are e-uniform. Note that the coefficients in the terms 
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with first-order derivatives in time and second-order derivatives in the space 
variables do not vanish in the equations discussed in [8,10]. However, for the 
Prandtl problem the coefficient multiplying the first derivative with respect to 
the variable x, which plays the role of the time variable, vanishes on the bound- 
ary lying on the x-axis. Unlike the problems studied in [8,10], where the bound- 
ary conditions do not obey any restriction, besides the requirement of sufficient 
smoothness, problem (2), (7), (1) is essentially simpler. Its solution depends only 
on the one parameter Uao- In [12] an e-uniform fitted operator method was con- 
structed for a linear parabolic equation with a discontinuous initial condition in 
the presence of a parabolic (transient) layer. Such fitted operator schemes have 
been successfully constructed because all of the singular components of the solu- 
tion (their main parts) are defined, up to some multiplier, by just one function. 
Because of the simple (depending on Uoo only) representation of the solution 
for the Prandtl problem, it is not obvious that for this problem there are no 
er-uniform fitted operator schemes. So it is of interest to establish whether such 
fitted operator schemes on uniform meshes do exist for the Prandtl problem. 

We try to construct a fitted operator scheme starting from equation (2a) 
under the simplifying assumption that the function v^{x,y) is known, and also 
that v^{x,y) = v{x,y). Let us consider a fitted operator scheme of the form 

(w'‘(x,y)) = ej(^2)SyyU^{x,y) - u^{x,y)6^u^{x,y) - 

-l{i)v{x,y)5yv!^{x,y) = 0, {x,y) G G/,, (la) 

u'"{x,y) = (p{x,y), (x,y) G Sh (lb) 

where 

Gh is a uniform rectangular grid (2) 

with steps hi and /i 2 in x and y respectively; the parameters 

l(i) = l{i){x,y;e,hi,h 2 ), * = 1,2 (Ic) 

are the fitting coefficients. The discrete equation (la) is based on the classical 
monotone discrete momentum equation (2a). We emphasize that 7 (q are inde- 
pendent of the unknown solution that is defined by the parameter Uoo ■ Note that 
equation (2a) is a particular grid equation from the class of discrete approxima- 
tions (la), (Ic), namely, when 7 (q = 1, * = 1,2. 

Relying on a priori estimates for the solution of problem (2), (7), (1) and 
its derivatives, we establish, in a similar manner as in [8,13], that there is no 
e-uniform fitted operator scheme of the form (1), (2). 

Theorem 1. In the class of finite difference schemes {!), (2) there exists no 
scheme whose solutions converges e-uniformly, as N oo, to the solution of the 
boundary value problem (2), (7), (1). 

Remark 1. We conclude that to construct an e-uniform scheme for the Prandtl 
problem (2), (1), provided that the coefficients 7 (q are independent of the prob- 
lem solution, it is necessary to use meshes condensing in the neighbourhood 
of the parabolic boundary layer. No matter, whether finite elements or finite 
differences are used. 
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5 Condensing Mesh Technique 

Here we briefly describe the approach to the construction of an e-uniform method 
with piecewise uniform condensing meshes that originated in [8] . 

We introduce a piecewise uniform mesh, which is refined in a neighbourhood 
of the boundary layer, i.e. of the set 5'°. On the set G, we consider the grid 

gI = HJi X 07* (1) 

where oJi is a uniform mesh on [di,d 2 ], ^2 ~ ^ 2 (®’) is a special piecewise uni- 
form mesh depending on the parameter cr and the value N 2 - The mesh is 
constructed as follows. We divide the segment [0, do] in two parts [0, a] and [cr, do] • 
The step-size of the mesh is constant on the segments [0, a] and [tr, do], given 
by = 2aN2^ and = 2{dQ — a)N 2 ^, respectively. The value of a is deflned 

by 

CT = min[2“^do, m ^ In iV 2 ] 

where m is an arbitrary positive number. 

In the case of the boundary value problem (2), (7), (1), it is required to 
study whether the solutions of the difference scheme (2), (1) converge to the 
exact solution. 

We mention certain difficulties that arise in the analysis of the convergence. 
Note that the difference scheme (2), (1), as well as the boundary value prob- 
lem (2), (1), is nonlinear. To And an approximate solution of this scheme, we 
must construct an appropriate iterative numerical method. It is of interest to in- 
vestigate the influence of the parameter e upon the number of iterations required 
for the convergence of this iterative process. 

In the case of e-uniform difference schemes for linear singular perturbation 
problems, methods are well developed to determine theoretically and numerically 
the parameters in the error bounds (orders of convergence and error constants) 
for fixed values of e and also e-uniformly (see and compare, for example, [14,15]). 
In this technique, e-uniform convergence is ascertained from theoretical investi- 
gations. Formally these methods are inapplicable to problem (2), (7), (1) because 
e-uniform convergence of scheme (2), (1) was not justified by theory. Neverthe- 
less, the results of such investigations of the error bounds seem to be interesting 
for practical use. 

In [16] we give an experimental technique to study if this method is £-uniform, 
and if so, to And the e-uniform order of convergence. Some relevant ideas are 
discussed in [13]. This technique is used in [16] to analyze the scheme (2), (1). It 
was shown that this scheme gives discrete solutions that allow us to approximate 
the normalized component and its first derivatives in x, y for problem (2), (7), 
(1). The order of e-uniform convergence is close to one. 

6 Conclusion 

We have discussed the problems that arise when the direct method is designed 
to solve the Prandtl problem for flow along a flat plate. We have shown that. 
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using a uniform mesh, it is impossible to construct a i?e-uniform method if the 
coefficients of the fitted operator are independent of the problem solution. For 
such operators, in particular, for classical finite difference operators, the use of 
the grids condensing in the boundary layer region is necessary for the method 
to be i?e-uniform. 
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Abstract. We consider a finite difference scheme, called Quickest, intro- 
duced by Leonard in 1979, for the convection-diffusion equation. Quickest 
uses an explicit, Leith- type differencing and third-order upwinding on the 
convective derivatives yielding a four-point scheme. For that reason the 
method requires careful treatment on the inflow boundary considering 
the fact that we need to introduce numerical boundary conditions and 
that they could lead us to instability phenomena. The stability region is 
found with the help of one of the most powerful methods for local anal- 
ysis of the influence of boundary conditions - the Godunov-Ryabenkii 
theory. 

1 Introduction 

Quickest is a finite difference scheme due to Leonard [8] that deduces this scheme 
using control volume arguments. Davis and Moore [2] have shown that Quickest 
can also be derived by considering the in the Taylor expansion of the time 
derivative and make some subsequent approximations. Morton and Sobey [10] 
using the exact solution of the convection diffusion equation, derived Quickest 
based on a cubic local approximation. Quickest scheme uses an explicit, Leith- 
type differencing and third-order upwinding on the convective derivatives yield- 
ing a four-point scheme. In the limit I? — > 0 is third order accurate in time. The 
use of third-order upwind differencing for convection greatly reduces the numer- 
ical diffusion associated with first-order upwinding [1]. Some of the literature 
about Quickest used in a flow simulation can be found in [1,2, 6, 8, 9]. The major 
difficulties associated with the use of Quickest scheme in multidimensions are 
in the application of boundary conditions, being the major reason to study the 
influence of a numerical boundary condition on the stability of the numerical 
scheme. 

Fourier analysis is the standard method for analysing the stability of discreti- 
sations of an initial value on a regular structured grid. This model problem has 
Fourier eigenmodes whose stability needs to be analysed. If they are stable at 
all points in the grid, and the discretisation of the boundary conditions is also 
stable then for most applications the overall discretisation is stable, in the sense 
of Lax [12]. 
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The influence of the boundaries can be analysed using the Godunov-Rya- 
benkii theory. The Godunov-Ryabenkii theory was introduced by Godunov and 
Ryabenkii [3] and developed by Kreiss [7], Osher [11] and Gustafsson et al [4] 
(now also called GKS theory). In this paper we And the stability region for the 
Quickest scheme subject to a numerical boundary condition by applying the 
Godunov-Ryabenkii theory. 

Gonsider the one-dimensional problem of convection with velocity V in the x- 
direction and diffusion with coefficient D: 



du ^ 

— = 0 < a: < 00, t>0 

ot ox ax^ 


(1) 


u{x,Q) = f{x) 


(2) 


u(0, t) = 0 


(3) 



u(-, t)|| <00 



(4) 



If we choose a uniform space step Ax and time step At, there are two dimen- 
sionless quantities very important in the properties of the scheme: 



fJ- = 



DAt 

Ia^’ 



V = 



VAt 

Ax 



V is called the Gourant (or GFL) number. 

Before we describe the Quickest scheme and its numerical boundary condi- 
tion, we give in the next section, a brief overview of the Godunov-Ryabenkii 
theory. 



2 Godunov-Ryabenkii Stability Analysis 

Two essential aspects of normal mode analysis for the investigation of the influ- 
ence of boundary conditions on the stability of a scheme are that the initial value 
problem needs to be stable for the Gauchy problem which is best analysed with 
the von Neumann method (this means the interior scheme needs to be stable) 
and that its stability could be destroyed by the boundary conditions, but the 
converse its not possible. 

In this section we give a brief description of the Godunov-Ryabenkii theory. 
For more detailed information about the theory we suggest [12,13,14] and spe- 
cially [4]. A particular note is to be made of the work [15,16], establishing a 
relation between the GKS theory and group velocity. 

We can approximate the problem (1)- (4) by the difference scheme 

Qt/” = t/;+\ 



j = r, r -k 1, . . . 



(5) 
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Q=j2a,E\ i?t/; = C/’Vi. (6) 

j=-r 

where aj are scalars. 

Two important assumptions are made: 

a) The scalars a_r and Op are non-singular; 

b) The finite difference scheme (5) is von Neumann stable. 

As Q uses r points to the left, the basic approximation can not be used 
at xo, xi,X 2 , ■ ■ ■ , Xr-i, so there we will have to apply boundary conditions. These 
can be the conditions that are given for the original problem (in our particular 
case is associated only with the point a:o), but they can also be difference schemes, 
which will then be called numerical boundary conditions. The choice of numerical 
boundary conditions is crucial for the stability. 

Let us assume that the boundary conditions can be written as 

= /3 = 0,l,...,r-l (7) 

where Ipj are scalars. 

The eigenvalue problem associated with our approximation is: 

Z(j)j = Q(j)j j = r,r + l,... (8) 

<7 

= X] /3 = 0, 1, . . . , r- - 1 (9) 

j=i 

M\h<°^ (10) 



Lemma 1 Godunov-Ryabenkii Condition The approximation is unsta- 
ble if the eigenvalue problem (8) - (10) has an eigenvalue z with \z\ > 1. 

Consider the characteristic equation of the interior scheme 

p 

z-'^ajk^ = 0. ( 11 ) 

j=-r 

Lemma 2 For z such that |z| > 1, there is no solution of equation (11) with 
|fc| = 1 and there are exactly r solutions, counted according to their multiplicity, 
with |fc| < 1. 

A general solution of (8) - (10) is of the form 

'('j = X! 1^1 > 1 

|fea|<l 



( 12 ) 
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where ka are solutions of the characteristic equation (11). This solution depends 
on r free parameters a = (cti, . . . , Ur). Pa{j) is a polynomial in j. Its order is at 
most rua — 1 where iria is the multiplicity of ka- 

Note that if the solutions are simple, this implies that the solution has the 
form 

(f)j= ^ a^kl- (13) 

This form of the solution is the one that usually arises in practice. 

Substituting (12) into the boundary conditions (7) yields a system of equa- 
tions 

C{z)a = 0, 

fj = (cti, . . . , tTj.) and we can rephrase Lemma 2 in the following form: 

Lemma 3 The approximation is unstable if 

Det C{z) = 0 for some z e C with \z\ > 1. 

Summarising, this theory is a generalisation of the von Neumann stability anal- 
ysis taking into account the influence of boundary conditions. It states that the 
interior scheme needs to be von Neumann stable and when considered in the 
half-plane a: > 0, a mode k^ with |fc| > 1 will lead to an unbounded solution in 
space, that is, k^ will increase without bound when j goes to infinity. Therefore 
|fc| should be lower than one, and the Godunov-Ryabenkii stability condition 
states that all the modes with |fc| < 1, generated by the boundary conditions, 
should correspond to \z\ < 1. 



3 Instability of a Quickest Scheme 

Consider the interior difference scheme Quickest: 

[/;+' = [1 - + (iz^^ + fz)j2 + zz(i - ^ - m)^m_]c/;, (14) 

where we use the central, backward and second difference operators: A^Uj := 
(f/,+1 - t/j-i)/2, A.Uj := Uj - Uj-i and S^Uj := Uj+i - 2Uj + 

We will consider two boundary conditions: the Dirichlet boundary condi- 
tion associated with the original problem, Uq = 0 and the numerical boundary 
condition that we need at the first point of the mesh, 

t/f+i = [1 - zzZio + + 

where is the forward operator defined by A^Uj := Uj+i — Uj. This numerical 
boundary condition is deduced by a similar method used in [10] to obtain the 
Quickest scheme, using a local cubic interpolation of the points 
U^+i - Oil the first point we can not use this interpolation since we do not have the 
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point U-\. We do instead an interpolation of the points f/^, t/f , U 2 , and 
it gives the difference scheme (15). The use of this downwind third difference 
at X = Ax does not affect accuracy because it stills based on a cubic local 
approximation near x = Ax as the interior scheme. However, as we shall show, 
it does have penalties in terms of stability. 

Let us consider the corresponding eigenvalue problem: 

1 1 

Z(j)j = [1 - vAq + (-1/^ + + v{- - y - J > 2 

00 = 0 

z4>i = [1 - vAq + (^1/2 + (1®) 

The Godunov- Ryabenkii condition tell us that the system (16) has an eigen- 
value z with \z\ > 1, then the approximation (14) - (15) is not stable. By Lemma 
2 we have for this approximation that the characteristic equation for the inte- 
rior scheme (14) has not k = ^ real for \z\ > 1 and there are exactly two 

solutions ki, i = 1,2 with \ki\ < 1 for \z\ > 1. 

Consider the characteristic equation for the interior scheme (14) 

/c^( — Cl -t- C2 -f C3) -t- /c ^( — z -t- 1 — 2 c 2 — 3C3) -t- A:(ci -t- C2 -t- 3c3) — C3 = 0. (17) 

where C\ = vj2, C2 = i^^/2 -|- /r and C3 = i/(l — — 6^)/6. 

Assuming that the two solutions of the characteristic equation are distinct, 
any solution of (16) has the form 

4>j = <7ik{{z) + CT2fc2(z). 

We want to find the solutions ki,i = 1,2 of (17), such that |fci(z)| < l,i = 1,2 
and the linear and homogeneous system 



CTl -I- CT2 = 0 

(7ig{ki,z, n, v) -I- (725(^2, 2:, lJL,v) = H (18) 

has a solution z with |z| > 1. The function g(k, z, g, v) is the polynomial: 

g{k, z, g, v) = k^cz + fc^(— ci -I- C2 — 3c3) -I- fc(l — 2c2 -I- 3c3 — z). 

Since the first equation gives cti = — <T 2, the linear homogeneous system (18) has 
a non-trivial solution if 



g{ki,z, g, v) - 5(^2, z, g, v) = 0. 
Consider fci(z) and k 2 {z) defined as: 



, , . Ti v^-3rj -k 4x2 , . X ri 

ki{z) = — + k2{z)=— 



2 



2 
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where ri and r 2 are: 

(19) 

(20) 

Let /(fc, z, fi, u) denote the characteristic polynomial for the interior scheme 
(see (17)). After some algebraic manipulations we can prove that for C3 yf 0, k\{z) 
and k 2 {z) are solutions of 

f{ki,z,fi,v) - f{k2,z,^,v) = 0 (21) 

giki^z^fi,!/) - g{k2,z,fj.,iy) =0. (22) 

If additionally to (21) fci(z) and k 2 {z) verify f{ki,z, g, v) + f{k 2 , z, /r, i/) = 0 
then ki{z) and k 2 {z) are solutions of /. In that way we have two solutions of / 
that verify (22). Note that the characteristic polynomial / is a third order poly- 
nomial, which means we expect three roots, although we only find the analytical 
solution of two of them. Let C{z, /r, v) = f{k\, z, g, v) + /(fe2, /x, v). For each 

(/X, v) we want to find Zfn, such that C(z^j/, /x, u) = 0. The requirement for insta- 
bility is \Zfii,\ > 1. Experimentally we observe that the solution z(/x, v) lies inside 
\z\ = 1 for certain values of /x and ly and then crosses it at 2: = —1. We can say 
2 = — 1 is the value of transition from stable to unstable. 

The function C{z, g,iy) as the form 

C{z, /X, ly) = ri{z, /x, ty){3r2{z, /x, ly) - 2r\{z, /x, v)){-Ci + C 2 + C3) 

{2r2{z, /X, ly) - rl{z, /x, iy)){-z + 1 - 2c2 - 803) 

-|-ri(2;,^, xx)(ci -I- C2 -b Scs) - 2c3. 

Let p(/x, ly) = C(— 1, g, v). We plot p(/x, z/) = 0 in Fig. 1. a). 

For {g,iy) such that p{g,iy) < 0 there exists an eigenmode Zfj.^, < — 1 such 
that C{Zfj,^,g,iy) = 0 (Fig. l.b)). 

This means that for = {(/x, ly) : v) < 0} there exists real and less 

than —1 such that fci(z^j^, /x, zx) and ^2(2^1/, ft; *^) ^^re solutions of / and verify 
(22). To assure that this eigenmode z^i, which absolute value is bigger than one, 
determine an instable region we still need to verify that for these (/x, v) we do 
have \ki{z^^, /x, i/)| < 1, x = 1,2. 

For z fixed let us define the following sets: = {(/x, ly) : |fci(z, p,iy)\ < 1} and 

= {{p,iy) : \k 2 {z,p,iy)\ < 1.} For z < -1, C A^, i. e., if |fc2(z, /x, z/)| < 1 
then |fci(z, /X, z/)| < 1. We plot C(z, zx) = 0 and Bz for z = —1, —1.5 in Fig. 2. 

From the figure we observe that in the region B-\ the root fc2(— 1, /x, zx), for 
(/X, zx) : p(— 1, /X, zx) = 0, become bigger than one approximately for zx < 0.09. For 
z = —1.5 the same happens but for zx even smaller. Since one of the roots we 
found become larger than one we can not conclude anything about the instability 
of the method for zx < 0.09. This is not a big problem since the von Neumann 
condition give us a stability limit for this region. We will plot the curve p{p, v) = 



ri(z,/x,zx) = 



r2(z,/x,zx) = 



(1 - z){-ci -I- C2 -I- C3) - 4ciC3 -I- 2c2(ci - C2) 
(1 - z)c3 - 2 ciC 3 - (ci - C2)^ 

(1 — z)(z — 1 -b 4 c 2) — (cf -b 6C1C3 -b 3C2) 

(1 - z)c3 - 2 ciC 3 - (ci - C2)^ 
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a) b) 

Fig. 1. &)p{^,v) = Q-, b) C( 2 ;, /i, = 0 for 2 ; = — 1, — 1.2, — 1.5, — 2 





a) b) 

Fig. 2. a) C{—l,iJL,v) = 0 is the line (-) and is the region between the 
lines (— ); b) = 0 is the line (-) and is the region between 

the lines (— ) 



0 for V > 0.09 and the von Neumann stability condition. We can see the unstable 
region plotted in Fig. 3. In fact running experiments numerically the region called 
stable in Fig. 3 is the exact region of practical stability. 
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Abstract. The macroscopic modelling of the macroscopic behaviour 
of inhomegeneous media requires evaluation of the effective moduli. In 
the relevant literature, many papers were concerned with estimating of 
macroscopic moduli Ae(x)) for two-phase materials. The main aim of this 
paper is to find the best estimation of Ae(a:)) from a given finite number of 
coefficients of power expansions of Xs{x) at x = Xi, i = 1, 2, ..., N,N +1, 
and apply them to model a torsional behaviour of a human cancellous 
bone filled with marrow. Errors of numerical evaluations are calculated 
and discussed. 

1 Introduction 

Macroscopic modelling of microinhomogeneous media requires the evaluation of 
effective moduli. However their exact values are available only in specific cases; 
for instance in one-dimensional periodic homogenization. In the relevant liter- 
ature, many papers were concerned with estimating of the effective coefficients 
Ae(a;), also for bio-materials such as human bones. 

The main aim of this contribution is to establish general bounds on the 
coefficients Ae(x) generated by an arbitrary number of coefficients of power ex- 
pansions of Ae(x) at X = X\,X 2 , ■■■,xn < oo and xn+i = oo and apply them to 
biomechanical problem of torsion of cancellous bone filled with marrow. 

2 Preliminaries 

Let us consider Stieltjes function represented by 

l/il+Xi) 



and satisfying the inequality /i(— 1) < 1. Here spectra i = 1,2,...,A^, 

7oo(u) are real, bounded and non-decreasing functions. Power expansions of /i(x) 
at X = Xi, i = 1,2 , ..., N, and xn+i = oo are given by 




( 1 ) 



0 



oo 



fi{x) = Ci,i,k+i{x - Xi)^, i = 0,l,...,N, 




(2) 



L. Vulkov, J. Wasniewski, and P. Yalamov (Eds.): NAA 2000, LNCS 1988, pp. 741—748, 2001. 
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Table 1. Discrete values of the elastic torsional modulus ~ 1 for hexagonal 

array of cylinders, after [6] 



X 


(^=0.76 


</3=0.80 


yj=0.84 


^3=0. 88 


-1 


-0.8711 


-0.8996 


-0.9286 


-0.9607 


0 


0.0000 


O.OOOO 


0.0000 


0.0000 


9 


3.3778 


3.9489 


4.6887 


5.7225 


49 


5.7076 


7.2600 


9.7931 


5.1565 


OO 


6.7600 


8.9586 


3.0093 


24.4508 



The coefficients i = 1,2, ..., N and doo,i,k are assumed to be finite for any 

fixed k and i. Let us introduce the rational functions gc{x',Pxi,Px 2 ^ ■■■iPxn ^Poo) 
and hoix-, ,Px^,Px 2 ^ ■■■■,Pxn ^Poo) defined by the relations: 



9c{x',Pxi,Px2t ■■■jPxm tPoo) 



a'i X + a'nX^ + ... + o', 



E[{P+1)/2\ 



^E[(P+A)/2] 



hD{x',Pxi,Px2J ■■■tPxn jPoo) 



1 + b[x + b^x'^ + ... + 

T -U -U 7-^[(-P+1 + ^)/2] 

aiX -h a2X -h ... -h ^e[{P-\-1-\-A)/2]^ 



1 + b'(x + h'ix^ + ... + 

dn{ ^iPxi I Px 2 ! ■■■tPxm^ Poo) — 1; 



( 3 ) 



where 



C = E[{P + Z\)/2] + E{P/2)-, D = E[{P + 1 + Z\)/2] + E[{P + l)/2] 
P = tp,+Poo. = I J ; [j , E{Q = max{[/ < C}. 



The parameters Px^,Px 2 ^ ■■■tPxn iPoo, appearing in (3) and (4) denote the num- 
bers of coefficients of power expansions of fi{x) at x\,X2, ■■■,xn , xn+i = oo. 
The functions gc{x-,Px^,Px 2 , —,Pxn ^Poo) and hD(x;px^,Px 2 , —,Pxn ^Poo) become 
diagonal and subdiagonal multipoint Fade approximants to Stieltjes function 
xfi (x) , if they satisfy: 



xfi(x) - gc(x) = 0((x- Xi)P^+^) , i = l,2,. 

= g^o = 

xfi(x) - hoix) = 0{{x- Xi)P^+^^) , i = l,2,. 



..,N, 

( Poo if P is even 
\ Poo — 1 if F is odd 
..,N, 



xfi{x) - hoix) = O 




Poo ~ 1 if F is even 
Poo if F is odd 



( 5 ) 



where 



gc{x) = gc{x;Pxi,Px 2 ,--oPxN^Poo);hD{x) = hD{x;pxi,Px 2 , --oPxn^Poo)- (6) 

For the sake of simplicity the notations gc{x-,Px^,Px 2 i ■■■:Pxn,Poo), 9 c{x) and 
hD{x-,Px^,Px 2 , --oPxNyPoo), ho^x) will equivalently be used. 
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Fig. 1. Multipoint Fade upper and lower bounds on (Ae(a:)/Ai) — 1= 
ln(0.5(x+2))predicted by Theorem 1 

3 Inequalities for Multipoint Fade Approximants 

In our paper [12] it has been proved the following theorem establishing the 
general inequalities for multipoint Fade approximants to Ae(x)/Ai — 1 

Theorem 1. For any fixed x € (—l,xi), (xi,X 2 ), ■■■, (xx,xk+i), ■■■, 

(xjv,oo) the multipointpoint Fade approximants gc{x]Pxi,Px2T--iPxN ^Poo) and 
hD{x\Pxi,Px2i ■■■iPxM^Poo) to the expansions of (Ae(x)/Ai) — 1 available at 
X = xi,X2, ■■■,Xk,xk+i, xn+i = 00 obey the following inequalities: 

(i) If X G (— l,a:i) then 

gc{x) > > hoix). (7) 

(a) If X G {xk, xk+i), = 1, 2 , N then 

gc{x) < (y -Ij < hD{x),P^ = 

^ ^ ^ k=l 

N 

where \e{x)/Xi stands for the limit as P = Pi + Poo goes to infinity of 1 + 

i=l 

gc{x;pxi,Px2,-,PxN,Poo) or 1 + hD(x;px^,Px2 , -,Pxm ,Poo) in x G (-l,oo). 
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a) 








- 0.1 % 

10 “ 10 ^ 10 ^ 10 “ 10 '* 10 “ 



Fig. 2. Hexagonal array of elastic cylinders with volume fraction ip and physi- 
cal parameter x = P2I — 1- (a) Multipoint Fade bounds (1 -I- Qa{x] 2, 1, 1, 1) 
— solid lines) and (1 -I- /ig(x; 2, 1, 1, 1)— scattered lines) on torsional mod- 
ulus pe/lJ-ii upper and lower bounds almost coincide, (b) An error ^ = 
( 54 ( x ; 2 , 1 , 1 , 1 ) - he{x; 2 , 1 , 1 , 1 )) / {I + 34 ( 2 ;; 2 , 1 , 1 , 1 )) for pj/ IJ-i 



4 Hexagonal Array of Elastic Cylinders 

Let us consider an elastic beam reinforced with elastic fibers arranged in a hexag- 
onal lattice. Assume that /3i and are Lame constants of the matrix, while (32 
and p 2 Lame coefficients of fibers. By p we denote the volume fraction of in- 
clusions. By using classical homogenization procedure the following equations 
defining the effective torsion modulus Pe/ l^i have been derived, see [ 11 ] 

w = m^YU{y)^dy, 

o|rK)^) + 4K)^)=0’ (9) 

jh ]h-fy = 9 , 

where U{y) is a characteristic function. The effective modulus Pe/yi has a Stielt- 
jes integral representation given by(l). For three dimentional materials the power 
expansion of Pe/yi — 3&tx = 0{x = h — 1, h = P 2 / Pi) takes form 

Pe/ yi - ^ = + ^p{l - p)x'^ + 0{x^) ( 10 ) 

The discrete values of Pe/yi — 1 are reported in [ 6 ], cf.Table also 1. 
Multipoint Fade approximants gi{x;po,pg,p 4 g,Pca) = 54 ( 2 ^; 2, 1, 1, 1) and 
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he{x;po,pg,piQ,Poc) = 2 , 1 , 1 , 1 ) evaluated from the input data ( 10 ) and 

Table 1 estimate the effective shear modulus Pe/pi from above and below, see 
Fig. 2a. From Fig. 2b we conclude that the torsional modulus Pe/pi differs from 
the multpoint Fade approximants 1 + 54 ( 0 :; 2, 1, 1, 1) and 1 + 2, 1, 1, 1) less 

then 0.3%. On account of that we take the rational function 

- ^ = he{x;po,pg,p49, Poo) = he{x]2, 1,1,1), (p<0.88. (11) 

as a solution of a system of Eqs (9) . 

5 Modelling of Torsional Behaviour of Cancellous Bone 

Let us consider an inhomogeneous beam consisting of viscoelastic cylinders reg- 
ularly spaced in a viscoelastic phase. 

For the investigation of the macroscopic responses of that porous beam the 
well known elastic- viscoelastic correspondence principle will be used, cf. [3]. That 
principle reads: the complex torsional modulus of a viscoelastic system 

one obtains by replacing in (11) a real variable a: by a complex one z. Hence we 
get 

^J-*e{z)/^J.l = l + hl{z;2,l,l,l), lor p < 0.88, z = x + iy = pi/ pi - 1. ( 12 ) 

Of interest is the composite consisting of fluid cylinders of viscosity p 2 regularly 
spaced in an elastic matrix of shear modulus pi. Such a composite material 
models a cancellous human bone filled with a marrow, see Fig 3. By substituting 
pI = pi, pI = Iujp 2 into ( 12 ) we obtain complex modulus of a prismatic porous 
beam filled with viscous fluid 

pliz)/ pr = 1 + hi (^-^-1;2, 1 , 1 , 1 ], lor p<0.88 (13) 




Fig. 3. (a) The scanning electron micrograph showing a prismatic structure of 
cancellous bone for a sample taken from the femoral head, cf. [4] pp. 318. (b) 
An idealized structural model of a prism-like cancellous bone 
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Fig. 4. Complex torsional modulus for the elastic porous beam filled with viscous 
fluid; (/?= 0.76, 0.80, 0.84, 0.88 



Figs 4 and 5 depict complex modulus fj,*{z)/fii and also the real and imaginary 
parts of it. Note that moduli / (/ri) and compliances fii/ z = (Iuj/k) — 
1, K = divided by Ito are Fourier transformations of the torsional creep 

function <P{t) and torsional relaxation function >F(t), respectively, cf. [3]. Hence 
we can write 



= = , = , 14 ) 

Iuj^ll{z) fli luj^i 



The inverse of Fourier transformations of <P{Iuj) and are given by 

= d ‘=+ ^ (1 - (1 + , 

n—1 
n—1 ^ 



(15) 



Here the coefficients d’’, and take values listed below 



</? 


(F 


bl 


^2 


^3 


al 


al 


a§ 


0.76 


0.1289 


0.0146 


0.0655 


0.9348 


2.1958 


0.8867 


0.1238 


0.80 


0.1004 


0.0476 


0.0666 


0.9226 


1.9109 


0.6432 


0.0948 


0.84 


0.0714 


0.0755 


0.0925 


0.9011 


2.3972 


0.5213 


0.0656 


0.88 


0.0393 


0.1339 


0.1471 


0.8608 


3.9565 


0.4500 


0.0344 



( 16 ) 
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Kt 



Fig. 5. The torsional creep function ^(t) and torsional relaxation function 
for a porous beam consisting of hexagonal array of viscous fluid cylinders spaced 
in a linear elastic matrix 



and 



'F 


(T 


bl 


^2 


^3 


a\ 


«2 




0.76 


7.7600 


60.980 


0.0974 


0.0431 


8.0939 


2.1575 


0.8312 


0.80 


9.9586 


102.62 


0.1831 


0.0218 


10.557 


1.8143 


0.6035 


0.84 


14.009 


209.39 


0.4192 


0.0192 


15.275 


2.2101 


0.4768 


0.88 


25.451 


737.96 


1.6569 


0.0223 


29.669 


3.4456 


0.3877 



The torsional creep function given by (15)-(17) and the relaxation func- 
tion determined by (15)-(16) have been depicted in Fig. 5 . 

6 Summary and Conclusions 

By applying the multipoint Fade approximants discussed in [2] the new upper 
and lower bounds on real- valued transport coefficients of two-phase media have 
been established (Th.l). The bounds obtained incorporate an arbitrary number 
of coefficients of power expansions of ^e{x)/^i available at finite number of 
points. Consequently the estimates (7)-(8) generalize the bounds reported earlier 
in literature. 

Multipoint Fade bounds (7)-(8) have been used to study the torsional be- 
haviour an idealized model of cancellous human bone. The torsional rigidities: 
complex modulus and creep and relaxation functions have been evaluated. By 
analyzing graphs we observe a hydraulic stiffening of a bone due to the presence 
of bone marrow . 
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Multipoint Fade aproximants are particularly suitable for implementation 
to mechanical problems. Their evaluation from the given coefficients of power 
expansions of analytical functions leads to fast, accurate numerical algorithms, 
which are simply recursive and do not involve the solution of large number of 
equations, see [8] 
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Numerical Algorithm for Studying 
Hydrodynamics in a Chemical Reactor with a 

Mixer 

I. Zheleva and A. Lecheva 

Rousse University, Technology College 
7200 Razgrad, POBllO, Bulgaria 

Abstract. The mixing in stirred vessels is acknowledged to be one of 
the most important characteristics for many industrial technologies. So 
that research in this area has concentrated for a long time on visual flow 
studies and on the measurements of overall properties such as power 
consumption, the overall gas holdup and overall mass transfer rate. Al- 
though, some features of the flow regimes inside the vessels can be ob- 
tained in this way. Mathematical modeling of the mixing processes re- 
cently becomes a useful and effective method for investigation. 

This paper presents a mathematical model for hydrodynamics in a cylin- 
drical reactor with one Rashton mixer. The liquid is supposed to be in- 
compressible and the process 2D steady state. The Navier- Stokes equa- 
tions are described by means of the stream-function and vorticity. 

A numerical methodology for simulating mixing processes in stirred ves- 
sels is also presented. The numerical algorithm for solving these equations 
is based on an alternating direction implicit method for irregular mesh. 

The proposed algorithm is tested for different model tasks. The initial 
numerical results for hydrodynamics are presented graphically. 



Introduction 

Mixing is one of the most important processes, used in many chemical produc- 
tions, especially in fermentation processes for obtaining antibiotics. 

For these technologies usually the biggest part of the power consumption is 
spent for mixing. Because of this a detailed investigation of mixing processes 
is still needed. For many real technologies physical measurements and natural 
experiments are very expensive and at the same time they are not accurate 
enough. So, recently, mathematical modelling becomes a proper reliable tool for 
studying complex technological processes. 

This paper presents a mathematical model and numerical algorithm for hy- 
drodynamics of a chemical reactor with a mixer. 

1 Mathematical Formulation 

1.1 Scheme of the Chemical Reactor with a Mixer 

The chemical reactor with a mixer is a cylindrical tank with radius R and height 
Z (Fig.l.). The mixer is situated at HI height on the cylinder axis, the radius of 
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the mixer spades is R1 and its thickness is LI. We assume firstly that the mixer 
is a disc with radius R1 and thickness LI. 

The chemical reactor is filled with viscous incompressible fluid. The mixer 
rotates with a given constant angle velocity il. 

We introduce a cylindrical coordinate system (f, Ip, z) , which is also given in 
Fig.l. 




Fig. 1. 



Fig. 2. 



1.2 Basic Equations 

For the formulation of the mathematical model it is assumed that the fluid in 
the chemical reactor is an incompressible homogeneous Nutonian fluid and its 
rotating motion is steady-state and axissymmetric. 

The mass and the momentum conservation equations can express the dy- 
namic behavior of the fluid. The Navier- Stokes equations are written in the 
introduced cylindrical coordinate system in the axis symmetrical case: 



d{rVr) , dV^ , d{zV,) 



dr 



dp 



dz 



= 0 



( 1 ) 



dVr j^dVr ^dVr 

Vr ~r y z 



VI 



dt 



dV^ 

dt 



dr 
yr ' 

or 
dt 



dz 
dz 



WVL 



I dp _ 



r^ 



dr 



dfz 



where = 



1 dp 2 — 

= — pp: + Vz 

p dz 



(2) 

( 3 ) 

( 4 ) 



^ -I- -I- (r-^) + p is the mass density, 

V (Vr, Fz)-the velocity vector which is a function only of f and z , p- the 
pressure, V- cinematic viscosity of the fluid. 
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We will look only for a stationary solution of these equations because we will 
examine the work of the reactor after its started. Then, obviously, the motion is 
stationary and axisymmetric also. 

1.3 Boundary Conditions 

On the solid walls of the reactor the velocity components are equal to zero 

V = Q (5) 

and the mixer is rotating with a given constant velocity. 

2 Numerical Algorithm 

2.1 Another form of the Equations 

For describing the numerical technology of simulating rotating processes in the 
reactor the equations (l)-(4) are written in another form by introducing the 
stream line function ip and vorticity uj in dimensionless form: 



d / 1 dtp \ d / 1 dtp\ 
dr \r dr J dz \r dz J 

dM 1 f d{rM)\ _ 1 \d fld{rM)\ 

dt r \ dr J dz Re dr \r dr J 



doj ^duj 1 dM"^ 1 f , 1 9U' 

\-U \-W = — V a; 

dt dr dz r^ dz Re [ r"^ dz _ 



r oz r or 



d^M' 



(6) 

(7) 

(8) 
(9) 



Here V {U,V,W) is the velocity vector with components in the introduced 
coordinate system, M is the momentum of the tangential velocity, r = z = 

= 77^’^ = 7^’^^ = = ifm =V^.r),t = tQ and Re is the 

Reynolds number. 

The geometrical area for the axisymmetric task described above is shown in 
Fig.2. 

The boundary conditions are: 
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As we will look for a stationary solution of these equations afterwards for 
describing the numerical scheme we will think for t as a fictive time parameter. 

2.2 Grid 

We construct a non-uniform grid. The points of the area, given in Fig. 2, namely 
A, B, C, D, M, N have to be nodes of the grid. For the describing of the grid 
the geometrical area is divided into three parts in z direction. 

The grid can be modified as the number of the points increases in the critical 
areas - near the mixer, near the walls of the reactor and near the symmetry 
line Oz. 




Fig. 3. The grid {N - number of points in r direction. Ml- number of points 
in 2 ; direction ) 



2.3 Approximation of the Vorticity Equation 

We write the following scheme for the vorticity equation (6)-(9): 



n+1/2 r? 

LO- — OJ- ■ 

^,J 


1 c 2 n+ 1/2 ^ c 2 n 1 /^n 

~ Re^'' ^ + 


( 11 ) 


0.5r 


^"+1 _ ^”+1/2 


= ^<52 "+1 + ±^2 "+I/2 
i?e ^ i?e 


( 12 ) 


0.5r 



where 



1 

^ dz 



<52 = 



dr'^ 



r dr ^ 






The boundary conditions (10) for the vorticity have to be calculated on 
the base of the known boundary conditions for the stream-line function. The 
connection between the vorticity and the stream function can be found from the 
equation (6) if we assume that this equation is valid on the boundaries. Then 
we can write the boundary conditions for the vorticity as [1]: 



Numerical Algorithm for Studying Hydrodynamics in a Chemical Reactor 753 



+ O (hr) on the right wall of the reactor, 

+ O {hz) on the top wall of the reactor, 
wij = — 7 ^^ + O {hz) on the bottom wall of the reactor, 
a; = 0 on the axis symmetrical line, 

^Hbz,j = — + O {hz) on the bottom side of the spade, 
<jJHbz+HU,j = + O {hz)on the top side of the spade. 



2.4 Approximation of the Stream Function Equation 



0.5r 

0.5r 



d f I dip 

dr \r dr 

d h I dip 
dr \r dr 



dz \r dz 



n+1 






d / 1 dip 
dz \r dz 



1,3 

n+1 



• OJ. 



^:3 



2.5 Approximation of the Momentum Equation 



(13) 

(14) 



- M 






0.5r 



1,3 ^ _L§2j^n+l/2 



Re 



+ - D 



Re 



^:3 



^,3 



^,3 






0.5r 



= J_j2^n+l/2 

i?e " i?e " 



-D 






(15) 

(16) 



where D = — yM — U^- — W^-, the operators Sf, 5l are the same 

like in equation ( 11 ). 



2.6 Alternating Direction Implicit Method 

Many methods for numerical studying of the Navier-Stokes equations are elab- 
orated [1, 4]. We use the Alternating Direction Implicit Method for solving the 
stream function, the momentum and the vorticity equations (13)-(16). The spe- 
cialty of this method is that the time of range t is realized in two time layers 
- n-|-l/2 and n-l-1. The priority of this method is that each difference equa- 
tion in this scheme has only tri-diagonal matrix form. For example the equation 
( 11 ) connects implicit unknowns and the equation ( 12 ) 

connects implicit unknowns . The scheme is effective and 

stable [ 1 ] 



2.7 Numerical Algorithm 

We use iterative procedure, in which the equations (11), (12) is solved firstly and 
then U3, Ip, U and W are calculated. The components of the velocity U and W are 
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Fig. 4. Stream function isolines in Fig. 5. Function M{r,z) ; Re=10 

(r, z) plane for Re=10 



defined on the base of the calculated stream functions Therefor the vorticity 
equation and the stream function must be solved together on the time layers 
n+1/2 and n+1. 

The iterative procedure converges to an acceptable accuracy result, if the 
proper grid and parameters are specified in the calculations. The solution is 
considered to be converging if 



e = max 









n 






n 



^ ^'tfj ) 



£ = max 



£ = max 















■ < £m- 



3 Numerical Results 

The real problem is solved with the following denominations of physical and 
geometrical parameters Re = 10, 100; mixer length R1 = 0.4, 0.5; mixer height 
Ll= 0.1; position of the mixer Hl= 0.4, 0.5. The accuracy parameter for these 
calculations is e = = em= 0.01, 0.05. 

We provide some tests to verify the numerical algorithm which clearly indi- 
cate that the developed algorithm works well for the test examples. 

The result for the stream function is given in Fig. 4. 

The results for the momentum is given in Fig. 5. 

The result for the vorticity is given in Fig. 6a. and Fig. 6b. 
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Fig. 6. Vector field in (r, z) plane, Fig. 7. Vector field in (r, z) plane, 

Re=10 Re=100 

The calculated characteristics of examined motion in the chemical reactor 
with a mixer(shown in Fig. 4, 5, 6) corespond to visual and other examination of 
the hydrodynamics of the reactor [6] . 



4 Concluding Remarks 

A reliable numerical algorithm for investigation of the hydrodynamics of a chem- 
ical reactor with a mixer is developed. The algorithm is based on the alternating 
direction implicit method. The algorithm is tested and results are acceptable 
and promising for studying hydrodynamics of mixing vessels. 
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Abstract. Numerical modelling of stationary heat and mass transfer 
processes in composite materials often leads to singularly perturbed 
problems in composed domains, that is, to elliptic equations with dis- 
continuous coefficients and a small parameter e multiplying the highest 
derivatives. The concentrated source acts on the interface boundary. For 
such problems the application of domain decomposition (DD) methods 
seems quite reasonable: the original domain is naturally partitioned into 
several non-overlapping subdomains with smooth coefficients. Due to the 
presence of transition and boundary layers, standard numerical methods 
yield large errors for small e. By this reason, we need for special methods 
whose errors are independent of the parameter e. To construct such DD 
schemes possessing the property of e-uniform convergence, we use stan- 
dard finite difference approximations on piecewise uniform grids, which 
are a prion refined in the transition and boundary layers. 



1 Introduction 

The solutions of boundary value problems in composed domains have singulari- 
ties generated by the presence of discontinuities in the coefficients and right-hand 
sides. When the problem in question is singulalry perturbed, there appear redun- 
dant singularities: the solution of such a problem typically contains transition 
layers in the neighbourhood of concentrated sources, besides of boundary lay- 
ers. Classical numerical methods from [1] developed for regular problems are 
inapplicable for resolving boundary and interior layers because of large errors 
(up to the exact solution) for small values of the perturbation parameter e (see, 
e.g., [2]-[4]). We are interested in parameter-robust numerical methods that 
converge independently of e (or e-uniformly) . Here more attractive are methods 
based on domain decomposition. Clearly, specific attention should be paid to 
non-overlapping DD techniques where involved subdomains are the subdomains 
forming the composed domain. Emphasize that we are to develop a DD method 
whose solutions converge to the solution of the original problem er-uniformly with 

* This research was supported by the Russian Foundation for Basic Research (grant 
No. 98-01-00362) and partially by the NWO grant (dossiernr. 047.008.007). 
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respect to both the number of mesh points and the number of iterations. 

Let us formulate the problem under consideration. On the vertical strip D, 



D = {x : — do < xi < (f , X 2 € R}, dp, > 0, 



( 1 . 1 ) 



which consists of two subdomains-strips Di and £> 2 , where Di = D C\ {x\ < 0}, 
D 2 = £>n{xi > 0}, we consider the Dirichlet problem for the following singularly 
perturbed elliptic equation of reaction-diffusion type 

Lku{x) = <6^ ^ aks{x)- — - - Ck{x)>u{x) = fk{x), x € Dk: (1.2a) 

u{x) = (fi{x), X G r, k = l,2. (l-2b) 

The concentrated source acts on the interface boundary F* = {x\ = 0} x i? 



1 d 1 

[ m ( x )] = 0 , luix) = £ ai{x)— — u(x) = —q(x), 
L uxi -I 



X G r*. 



(1.2c) 



Here F = D\D, the functions aks{x), Cfe(x), /fc(x) are assumed to be sufficiently 
smooth on Dk, and the functions f{x) and g(x) on F and F*, respectively, 
moreover ^ 

0 < Op < aks{x) < a°, 0 < Co < Cfc(x) < c°, | fk{x) \ < M, x G Du, 



ip{x) I < M, X G F; I q{x) | < M, x G F*; fc, s = 1, 2; 



(1.3) 



the parameter e takes arbitrary values from the half-interval (0,1]. The symbol 
[c(x) ] denotes the jump of the function v{x) when passing through F* from Di 
to H 2 : [m(x)]= lim it(x^) — lim u{x~), x G F* and 



x+- 
x'^GD2 



X — >x 
x~ £D\ 



d 



d 



d 



ai(x) 7 - — u{x) = lim a 2 i(x“^) 7 - — u{x~^) — lim an(x )t- — it(x ), xGF*. 



dx 



X+GD 2 



dx\ 



X — >X 

x~ 



dx\ 



It is convenient also to write (1.2a) in such a form: L u{x) = /(x), x G D\F*, 
where £ = Lfc, /(x) = /^(x) for x G Dk- 

As e: — > 0, boundary and transition layers appear in a neighbourhood of the 
sets F and £* respectively. 

For problem (1.2), (1.1) we are to construct a domain decomposition scheme 
by using, as a base scheme, the c-uniformly convergent scheme from [2]. 



2 The Base Scheme for Problem (1.2), (1.1) 

Let us first give an iteration- free difference scheme. On the set D we introduce 

the rectangular grid 

Dh = X 0J2, (2.1) 

^ Here and below we denote by M (m) sufficiently large (small) positive constants 
which are independent of e and the discretization parameters. Throughout this pa- 
per, the notation wy.i,) indicates that w is first defined in equation (j.k). 
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where lJi = {x\ : — do = < ... < x^^ = }, UI 2 = {x^2 ■ ^2^^ ^ j = 

. . . , — 1, 0, 1, 2, . . . } are arbitrary (possibly) nonuniform meshes on [—do, d°] and 
on the a: 2 -axis respectively; the point xi = 0 belongs to ZUi. Assume h < MN~^, 
where h is the maximum of the mesh-sizes, N = min [ A^i , 7 V 2 ] , A^i -|- 1 and iV 2 -I- 1 
are the number of nodes in the mesh ZUi and the minimal number of nodes in lu 2 



on a unit interval. Denote xi = 0 by x'^ . 

We approximate problem (1.2), (1.1) by the difference scheme [1] 

Az{x) = f^{x), xGDh, (2.2a) 

z{x) = (p{x)^ X € Fh- (2.2b) 

Here Dh = DnDh, rh = FnDh, 

A = £ ^ ^ Fs ^ ^ ^ — 1; ^5 (2.2c) 

s^l,2 

A = e^ 2 {h\°+h\°-y\a 2 i{x) 6 :,i-an{x)S^}, x = {x\\x 2 ) G (2.2d) 
= fk{x), xGDkh, k=l, 2 - (2.2e) 

f\x)=- 2 e{hd,^ +hdr^y\{x), xGT*, (2.2f) 



with <5—^ z{x) being the second (central) difference derivative, e.g., <5^^ z{x) = 
2 + h\) ^ (4i z{x) - 6^z{x)), 5x1 z(x) = (hi) ^ x (z(xy\x 2 ) - z(x)), 

= (y~^) ^ (^(^) - z(xl~\x2)), X = (xl,X2) (cf. [1]). 

Scheme (2.2), (2.1) is monotone e-uniformly [1]. By applying the majorizing 
technique and taking account of a-priori estimates, we find the error bound: 
\u(x) — z{x)\ < M x G Dh. Thus, scheme (2.2), (2.1) does 

not converge e-uniformly (for all values of e no matter how small) . 

We are now in a position to define the piecewise uniform grid from [2] which 
is condensed in the neighbourhood of the boundary and transition layers: 

D* = ZJ* X c^2, (2.3) 

where u>2 = <^2(2 l)i ~ is a piecewise uniform mesh on [— do,d°]- 

To construct uJi(a), we divide [— in five parts [— doj~<^o + o]i + 
(T, — cr], [— cr. O'], [a, (P — a] and — a, d^]. In each part we use a uniform mesh, 
with step-sizes = 8 a Nl~^ on the subintervals [—do,— do cr], [—a, a], [d° — 

cr, d°] and = 2 (do -\- dP — 4cr) on [—do a, —a], [a, dP — a]. We take 
cr = min 3“^d°, m“^eln A^i ] as a function of e and Ni, where 0 < m < 

mP, mo = min (x) Cfe (x) ] . See also [3,4] for more details of this 

fitted mesh technique. 

Theorem 1. Let the data of problem {1.2), {1.1) satisfy eondition {1.3), and 
assume aus, Ck, fk G C^+^+^{Dk), G C^+^+^{r), if G C^+2+“(T*), s, k = 
1,2, a > 0 with K = 4. Then the solution of seheme {2.2), {2.3) eonverges 
e-uniformly to the solution of {1.2), {1.1) with an error bound given by 

]m(x)-2(x) 1 < A/[A^f^lniVi-kiV2-^] , x gD^. 



(2.4) 
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3 Relaxation Scheme 

In this section we consider a relaxation scheme assuming that the original do- 
main D is partitioned into the non-overlapping subdomains D\ and I?2. 

1. At first, with problem (1.2) we associate the ’’nonstationary” problem 



-^(1.2) = fi^)^ 


(x,f) gG\S*, 


(3.1a) 


1 

II 

h" 


[rc(x, t) ] = 0, (x,t) G S*, 


(3.1b) 


w{x, t) = 'i/>(x, t). 


(x,t) G S. 


(3.1c) 


GUS, G = GiUG2U5 


*, Gfc = FfcX (0,oo), fc = l,2. 


(3.2) 



Here 



= r*x (0,oo), S = S^[jSo, = rx (0,oo), S'o = A*x {t = O}, 

d 

the operator I is defined by I = 2) ~ j P > Oj the function is 

sufficiently smooth and coincides with ip{x) on 5^, besides, it is bounded on Sq. 

The maximum principle holds for problem (3.1), (3.2). By estimating |m(cc) — 
w{x, f)| one can verify that the function w{x, t), as t — *■ oo, converges e-uniformly 
to the steady-state solution u{x) of problem (1.2), (1.1). 

2. For problem (3.1), (3.2) we apply the method of lines along x with explicit 
approximation of equation (3.1b) along t. 

We discretize the set G as follows. On D we construct the sets = 
X R, D°= {x\°+\d°) x i?, r*~ = x\°~^ x R, R*+ = x\°+^ x i?, 

where x 



Zq — 1 



= 0, x^i are the nodes of the space mesh uji. Suppose = 
U ^2 U U On the semiaxis t we introduce a uni- 

form mesh Wo with step r. Further we construct such a ’’grid” in time 



Gr = GrU^r, Gr = Gir U G 2 r U K, = (3.3) 

where Gkr = x wq, S'*® = T*® x wq, = F x wq, Sqt = F* x { t = 0, t }. In 
this way, the grid G,.(3 3^ is controlled by Ji, 62 and r, where = x^° — 

62 = — xf , = 0. Note that the values of (5i and 82 may depend on the 

perturbation parameter e. We use the notation 5i = i = 1,2. 

By applying the method of lines in x, we construct the semi-discrete scheme 

ArW'^{x,t)=f'^{x), {x,t)GGr, w'^ {x , t) = tp'^ {x , t) , {x,t)£Sr- (3.4) 

Here — '^(1.2)’ (^;^) ^ ? A^ — ^(2 2d) P ^ (^5 ^ z 

52 



z1,=A,i(x)(A_j_) 

At — £ Uii (x) ~Qx~ ) 

= ■!/'(3.i)(a:,t), {x,t) e S. 



e^a22{x) 

e^ai2{x) 



^^^2~'^2 (x), (x,t)eS: 

q 2 

^^-Ci(x), (x,f)GS: 



*0 
r 1 



*0 
T 1 



x G F*+, 



X G F* 



L 

T 1 



J /(1.2)(2^)z {x,t)GGr\S*, 

l^(2.2f)(^)’ (a^,i)eS;, 
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on the set Sor for t = 0, and for t = t is any 

sufficiently smooth function, moreover, ip'^{x,t) satisfies the Lipschitz condition 
with respect to t. 

3. Let us study scheme (3.4), (3.3). The condition 

r<p^sup inax I aii(a;) + « 2 i(a;) I ^ (3.5) 

is necessary and sufficient for scheme (3.4), (3.3) to be e-uniformly monotone. 
The condition 

T<mipinf min ( 5 i = 0^ ( ( 5 i ) , mi = 2 “^(a°)“^ ( 3 . 6 ) 

is sufficient and, up to a constant factor, necessary for e-uniform monotonicity. 
If = 0, then r = 0 and the process loses its stability. 

To scheme (3.4), (3.3) we put in correspondence the stationary scheme 



A°w°{x) = fix), x&D°, 
Here is defined above, U F, 

^o^f^r(3.4), 

l^(2.2d)> J’ 



w°{x) = <p{x), X £ r. 
fix) = fix), x&D°. 



(3.7) 



The scheme (3.7) approximating the boundary value problem (1.2), (1.1) is 
e-uniformly monotone. The following estimate is valid: 

\uix) — w^ix)\ < PfSi + 62) , X G . (3.8) 

Thus, for (5i, 62—^0 the function w^ix) converges to u(x) for fixed values of the 
parameter e, and it does e-uniformly under the condition 5i, ^2 = o(l)- The 
last condition is also necessary for e-uniform convergence of scheme (3.7). 
Under condition (3.5) we obtain the estimate 



jw^ix) — w'^ix,t)l < M 1 + mp ^ ( 1 -I- -I- (52 ) 



-t/r 



X e D°. 



(3.9) 



For fixed values of (5i, ^ 2 , x and for t 00 the function w'^ix, t) converges to the 
function w^ix). By (3.9) the condition t inf ( 1 -|- (5i -I- (52 ) ^0 is necessary 

and sufficient for e-uniform proximity of the functions w^ix) and w'^ix,t). 

By virtue of estimates (3.8) and (3.9) we have 



|m(x) — w'^ix,t)\ < m(^5i + §2 



l + mp ^ ( 1 -I- (5i -I - ^2 ) 



-t/i 



X G D 



Consequently, the function w'^ix,t) converges e-uniformly to m(x) under the 
condition ^ ^ 

5i, ^2^0, t^oo. (3.10) 

Recall that this result is valid in the case of condition (3.5) imposed on r. The 
condition (3.10) is also necessary for e-uniform convergence of scheme (3.4), (3.3). 
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4 Non-overlapping Schwarz Method 



The results obtained in Section 3 can be rigorously written in terms of non- 
overlapping domain decomposition Schwarz-like methods. 

1. We begin with consideration of the continuous Schwarz method. As a 
preliminary, on the set D we introduce the sets 



D^=dI[_}dI, Dl = Dl[jrl fc = l,2 and F 



*1 



(4.1) 



where D\ = x R, D\ = [x\”+\d°) x R, = {-do x R} {JR*, 

R^ = T* y {d° X i?}, R*^ = R*~ y r*^ y R*. These sets with upper index 1 
only slightly differ from the sets with upper index 0 considered in Section 3. 

Let us introduce the functions Uq(x), x G R*, u^(x), x G Df., k = 1,2, 
assuming Uq^x) = x G R*, u^{x) = ui'^(x,f"), x G dI, fc = 1,2 for 

= nr, n = 1, 2, ... . We find the functions (x), Uq{x) by solving the problem 



^Kix) = f{x), xGDI, 

_ f </j(x), x£R^f]R, 

\<(x), xGr,i\T, fc = l,2; 

u^x) = {f\x),u*^~^(x)) , xGT*, 



Kix) = 



(4.2) 



n = l,2,... . 

Here A = xeDlljR^; /^(x) = ^^(x), xGD^\R, 

F\f\x),u*^-\x)) = u--\x) +p-y { A(2.2d) y^*^-\x) - f\x) },x G T* 



<(x), X G r*, 
yx) = <( <(x), xgt*-, 
^(x), xGR*+ 



X G r 



*1 



Note that the function Mq(x), x G R* for n = 0,1 is given according to the 
problem formulation. We call the function u"’(x) = { m^(x), x G Dj^ , fc = 1, 2 }, 

X G H n = 1, 2, . . . the solution of scheme (4.2), (4.1). The value n defines the 
the current iteration in the iterative scheme (4.2), (4.1). 

All the considerations of Section 3 (estimates and conditions) remains valid 
with replacing w'^{x,t) by m"(x) and t by nr. 

We say that scheme (4.2), (4.1) for n = n* is consistent with respect to both 
the limiting accuracy (for n = oo) of the solution and the number of iterations 
(or, briefly, consistent), if such an estimate is true: max|M"(x) —u°°{x) \ < 

M max I u(x) — u°°{x) |, n> n*, where n°°(x) = w^{x). 

For consistent monotone scheme (4.2), (4.1) the estimate like (3.8) holds: 

|n(x) — m"(x)| < M (^1 -I- ^2 ), X G , n > n*, 
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where n* < M maxi ( ) In ( ^ for m mini 6i < t < M min^ Si . 

2. Let us construct the discrete Schwarz method. For this we define the 
computational grids on the sets D , and 

Dl = D^r\Dh. Dli, = Dlr\Dh. r:^ = r*^r\Dh, (4.3) 

where either Dh = 1 ) or Dh = D^2.3)- We approximate problem (4.2), 

(4.1) by the totally discrete scheme 



AzJ^{x) 

Zk(x) 



z^(x) 



f\x), X e Dl,^, 

(ip(x), xer^f^ClF, 

\z^{x), xer^^\r, fc = i, 2 ; 

F\f{x),z*-\x)), x&F*, 



n = 1 , 2 , . . . . 



(4.4) 



Here yl = yl(2.2), x & /^(x) = 2)(^)> x&dI\F, 

F^(/i(x),z*”-i(a;)) = Zq "^( x) + p" V{yl( 2 . 2 d)^* ””^(^) “ f(x)}, x e 



z*"(x) 



z^{x), 


xGF*, ' 


zUx), 


xGF*-, 


^^x), 


x&n+ . 



the function Zq(x), xSF^ for n = 0, 1 is assumed to be given: Zq{x)= 2 )^^^’ 

X e F^, n = 0, 1. We call the function z"(x) = {z^(x), x G fc = 1,2}, 

X € n = 1, 2, . . . the solution of difference scheme (4.4), (4.3). It should be 
noted that the functions Zi{x), x G and Z 2 {x), x G can be computed 
in parallel on each n-th iteration. 

3. We confine the convergence analysis to scheme (4.4), (4.3) in the class of 
piecewise uniform grids (2.3). 

Under condition (3.5) (condition (3.6)) for the solutions of scheme (4.4), (4.3) 
we obtain the estimate 



z(x) — z”(x)| < M 1 + mp ^(1 + <5 i + ^ 2 ) t 



X € Dh 



(4.5) 



no matter whether 1 ) or ^k{2.3) used. 

On the grid 0^(2."^) condition (3.6) takes the form 

T < m 2 piVf^ In iVi = r *4 0 ^(A^i). (4.6) 

In this way, taking account of estimates (2.4) and (4.5), we find 
|u(x) — z"(x) I < M(A^j“^ In + [1 + mp“^r] "), x G D^, (4.7a) 

|z(x) — z"'(x)| < M [ 1 + mp“^r ] ", x G D^- (4-7b) 



Thus, under condition (3.5) (condition (4.6)) and, besides this, provided that 
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nr^cxD, for N, n^oo (4-8) 

the difference scheme is £-uniformly monotone, and the functions z^(x) converge 
to u(x) e-uniformly. In the case of (4.6) complemented by the condition 

n, N ^ ao (4.9) 

the scheme possesses the property of er-uniform monotonicity, however, it does 
not converge even for fixed values of the parameter e. 

Under the condition 

^ = ^(4.6)(^i) (4-10) 

the solutions of scheme (4.4), (4.3), (2.3) satisfy the estimates 

\u{x) -z^{x)\<M{ In iVi + iV 2 "^ + [ 1 + m In iVi ] ”” ) , 

X e (4.11a) 

\z{x)-z'^{x)\<M[l + mN^hnNiy , x eD^. (4.11b) 

This scheme converges e-uniformly under the condition nN^^nNi — > oo for n, 
N oo. If this condition is violated, we have no convergence. Thus, the number 
of iterations n of the monotone scheme (4.4), (4.3), (2.3), (4.10), required for 
e-uniform convergence, is independent of e and unboundedly grows as N oo. 
For consistent scheme (4.4), (4.3), (2.3) under condition (4.10) we obtain 

\u{x)-z'^{x)\, \z{x)-z'^{x)\ < M[N^hnNi + N^^], x (4.12) 

where n > n*, and also n* < M N\. 

Theorem 2. Conditions {3.5) and (4-3) are necessary and sujficient for e- 
uniform monotonicity of scheme (4.4), (4.3), (2.3) {just condition {3.5)) and for 
e-uniform convergence of the functions z^^ ^ ^ g^{^)^ x € D^2.3) the solution 

u{x) of problem (1.2), (1.1) and to the function z^£ ^ ^ ^h{2.3)- Condi- 

tions (4.6), (4.9) are not sufficient for convergence of 2 "(x) to u{x) and z{x) 
for fixed values of the parameter e. The functions z"(x) under condition {3.5) 
satisfy estimates (4.7b), (4.11b), and also (4.7a), (4.11a), (4.12), if, besides, the 
hypotheses of Theorem 1 are fulfilled for K = S. 
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Abstract. Numerical modeling of human pelvic bone makes possibili- 
ties to determine the stress and strain distribution in bone tissue. The 
general problems are: complex geometry, material structure and bound- 
ary conditions. In the present paper some simplifications in numerical 
model are performed. Homogeneous elastic properties of bone tissue are 
assumed. The shell model and solid model of pelvic bone are analyzed. 
The finite element method is applied. Some numerical results for solid 
and shell model are presented. 



1 Introduction 

Pelvic bone is an important supporting element in locomotion system of human. 
By linking with the spine through sacral bone and the head thighbone in pelvic 
joint, pelvic bone transfers not only gravity of the upper body parts but static 
and dynamic loads following stabilized body stance and locomotion as well. So, 
in physiological conditions in the pelvic bone there is a certain stress distribution 
that changes under influence of loads and changed anatomical structures. 

It is very difficult or impossible to measure the strain and stress ”in vivo” 
because the safety of patient should be taken into account. There are only two 
possibilities: model testing and numerical calculations. Complex geometry and 
material structure of bone tissue as well as its state of load or physiological reac- 
tions complexity, cause huge variety of acceptable assumption in 3D numerical 
models [1,4, 5, 6] and shell models [8,9] which exerts an influence on the calcu- 
lation outcomes. It is well known that stress distribution depend on boundary 
conditions. It can be observed during numerical analysis of human pelvic bone 
([8,11,12,13]). There is one important question: how to model the boundary con- 
ditions in pelvic bone? It causes the next questions: How to model the contact 
with others elements of bone system? What we know about the stiffness of sup- 
port? How to model the load? And many others. 

When the stresses are analyzed it appears that the stress distribution depend 
not only on the boundary conditions but on the yield criterion too. The values 

* The work was done within the confines of research project 8T11F02618 
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of maximal reduced stresses are changed and the regions of thier application so 
on. 

The present work defines stress and strain distribution in human pelvic bone 
using earlier studies [10] made in Department for Strength of Material and Com- 
putational Mechanics in Silesian University of Technology, adding new elements 
in numerical model’s structure and load caused by muscle tensions [3,4]. 

2 Numerical Model 

Numerical models are performed on the base of data from anatomical specimen. 
3D numerical model of pelvic bone have been worked out relying upon pro- 
grams, PATRAN/NASTRAN. Eight-nodes solid elements illustrating 3D stress 
distribution were used for modeling. Figure 1 shows the view of solid model. 
Separate solid elements layers are modeled by cortical and trabecular bone. Fig- 
ure 2 shows the view in cross-section of solid model of pelvic bone. The shadow 
elements model the cortical bone and the light trabecular bone respectively. At 
present homogeneous elastic properties within a certain group of tissue as well as 
continuum are assumed. Cortical bone is modeled by one layer of elements while 
trabecular bone one or more, depending on model’s bone tissue’s thickness. On 
the basis of data from [1] assumed Young’s modulus 15GPa and lOOMPa for 
cortical and trabecular bone respectively. 

In shell model assumed Young’s modulus lOGPa, Poisson’s ratio 0.3, and quo- 
tient of compression strength to tensile strength 1.5. The thickness of elements 
depend on real dimension of bone tissue. 




Fig. 1. Solid model of human pelvic bone 



Stress and strain distribution of human pelvic bone is a result of external load 
coming from upper body part’s weight and muscles forces (Table 1). Referring 
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Fig. 2. Cross-section of solid model of pelvic bone 



to earlier works [4,5], the model takes up 23 muscle tensions influencing through 
pelvic bone and tendons on insertions’ surfaces. 

Muscle forces are depicted in the numerical model as load spread out on 
nods on insertions’ surfaces. The load slants to surface of pelvic bone under 
angle determined by directive cosines of muscle tensions effect line. Muscle force 
values are assumed in isomeric conditions [3,4]. Calculations took place with 
minimum and maximum load: i.e. loaded only by P force and by P force with 
every muscle force the same time. Muscle tensions load does not take components 
caused by passive fiber stretch into consideration. 

The results show in the present paper are obtained for shell and solid 
model of human pelvic bone. The results hardly depend on boundary condi- 
tions [8,9,12,14]. Here, in acetabulum boundary conditions are given using 20 
axial elements (in radial co-ordinate). In contact area with sacral bone bound- 
ary conditions are given using axial elements in two co-ordinates, respectively. In 
pubic symphysis boundary conditions are given in symmetry plane as restraints 
in selected co-ordinates (selected components in nodes). 



Table 1. Maximum values of active muscle forces, muscle tensions interacting 
on pelvic bone 





RF 


S 


IP-1 


IP-2 


GRA 


[N] 


835 


148 


1006 


1006 


165 




GMx-1 


GMx-2 


ST 


SM 


BGL 


Fmax [N] 


1559 


780 


226 


1359 


745 




ADM-1 


ADM-2 


ADM-3 


ADL 


ADB 


PC 


F,„ax [N] 


354 


1063 


354 


593 


452 


188 




GMd-1 


GMd-2 


GMd-3 


Gmu-1 


Gmu-2 


Gmu-3 


TFL 


Fmax [N] 


425 


425 


425 


249 


249 


249 


286 



In Table 1 the following muscle actons symbols were taken: flexors: RF - rec- 
tus femoris, S - sartorius, IP - iliopsoas, IP-2 - psoas maior, GRA - gracilis 
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extensors: GMx - gluteus maximus, ST - semitendinosous, SM - semimembra- 
nosous, BCL - biceps femoris caput longum abductors: ADM - adductor magnus, 
ADL - adductor longus, ADB - adductor brevis, PC - pectineus adductors mus- 
cles and stabilising the pelvis: GMd - gluteus medius, GMu - gluteus minimus, 
TFL - tensor fasciae-latae 

3 Numerical Analysis 

Numerical results for shell model are obtained for many cases of boundary con- 
ditions. The stiffness of axial elements are changed. There is only one load- 
maximal load. Analysis of results is performed using three yield criteria: max- 
imum shear-stress (Tresca) criterion, shear-strain energy (von Mises) criterion 
and Burzynski’s criterion (modification of shear-strain energy criterion with re- 
spect to different value of compression strength and tensile strength) . 

In table 2 the maximal values of reduced stresses for selected models are 
depicted. The stress distributions for model 5 are shown in figures 3 and 4 for 
shear-strain energy (von Mises) criterion and Burzynski’s criterion respectively. 



Table 2. Maximum values of reduced stresses for selected models 



Model 


Maximal values of reduced stresse 
in MPa for given yield criteria 


NR 


Tresca 


Mises 


Burzyriski 


1 


69 


62 


56 


2 


87 


84 


55 


3 


143 


133 


93 


4 


149 


144 


145 


5 


157 


150 


112 


6 


171 


162 


157 



For all cases figure a) shows the result on outer surface and figure b) shows 
on inner surface. It can be observed that stress distribution in numerical model 
of human pelvic bone depend on boundary conditions and yield criteria. Not 
only the values of maximal reduced stresses are changed but the regions of their 
application too. 

Numerical analysis for solid model is performed only for one case of boundary 
conditions and for many load cases. Results presented in this paper are obtained 
for one load case only. Figure 5 shows reduced stresses (von Mises) on outer 
surface of pelvic bone and fig. 6 shows displacements. 

The value of reduced stresses is printed in kPa and value of displacement in 
meters. The next figure (7) shows distribution of reduced stresses (von Mises) in 
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Fig. 3. Von Mises reduced stresses in human pelvic bone: 
a) outer surface b) inner surface 



cross-section of solid model of pelvic bone. There, we can see the maximal value 
of stresses on outer surface in cross-section and minimal value in inner area. 
We can compare the results of solid and shell model. It can be observed that 
maximal value of reduced stresses and the stress concentration areas are very 
closer for the same load case and boundary conditions. When the solid model 
is analyzed not only stresses on the surface of pelvic bone can be taken into 
account but the inner stresses in cross-section too. 




Fig. 4. Burzyhski’s reduced stresses in human pelvic bone: 
a) outer surface b) inner surface 
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Fig. 5. Distribution of reduced stresses (von Mises) for solid model of pelvic 
bone 



MSC^ATRAN V«r«ion • O 2«-M«y-99 14 07 10 

Fring« i«»ysik« o*z in om«ntu. SUM SuOC«6« Diipl8c«m6nt«. Tr6n«Uilion«IHNON*LAYERED> (MAG > 





9 94-03 
9 29-03 
9 62-03 
7 99-03 
7 30-03 
9 64-03 
9 99-03 

5 32-03 
4 69-03 
4 00-03 
3 34-03 
2 69-03 
2 02-03 
1 36-03 

6 96-04 



3 77-05 
64f9un_Frifio* 

M*i 9 94-03 CNd 3707 
Mm 3 77-09 9N6 1397 



Fig. 6. Displacement diagram for solid model of pelvic bone 



4 Conclusions 

Presented numerical models of pelvic bone are performed using finite elements 
and assumed constraints, with a little approximation mapping anatomical shape 
of the bone and its character of joint in pubic symphysis, on the point of contact 
with sacral bone and thighbone’s head in acetabulum of pelvic joint. 
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Fig. 7. Distribution of reduced stresses (von Mises) in cross-section of solid 
model of pelvic bone 



Numerical analysis of human pelvic bone shows that stress distribution de- 
pend on boundary conditions, e.g. on stiffness of given restraints. There is also 
problem: how to model the contact with others elements of bone system and 
what value of material coefficients should be assumed. 

The stress distribution and maximal value of reduced stresses depend on yield 
criteria. In selected model the difference increases over SOMPa, e.g. over 30%. 
When the Burzynski’s criterion is applied the maximal value of reduced stresses 
decrees. It seems that Burzynski’s criterion is closer to real existing conditions. 

Shell model of pelvic bone is easy in implementation and the maximal values 
and distribution of reduced stresses are very closer to solid model. 
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Abstract: Implementation of high speed engines requires application of high ratio 
mechanical gears. Relatively, the smallest mechanical gear is the cycloidal planetary 
gear known as Cyclo gear [2, 8- 11]. The complex construction of planet wheels 
in cycloidal planetary gear (Cyclo) practically makes impossible its optimal design. 
To calculate distribution of displacements and stresses in planet wheels with co- 
operating elements FEM has been implemented. There were series of numerical 
models of planet wheels generated and for example of real model of gear it has 
been calculated proper values of forces, strains and stresses. In the paper forces 
and strains calculated with FEM have been used to check the assumptions which 
have been applied only in analytical so far. 



1 Introduction 

The Cyclo gear consist of planetary gear fig. la and straight-line mechanism 
fig. lb in series connection. Because of that kind of connection we get 
compact gear with stationary central gear (2), which is mating with one or 
two planet wheels (1, 1’). Planet wheels are driven by the eccentric yoke(3), 
fig. Ic. In case of immovable stationary wheel (2), a kinematics ratio is given 

a) b) c) 





Fig.l Kinematic scheme of planetary cycloidal gear (Cyclo) 
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as follows [8, 11]: 



-gi _ zi 
Z2 — Zi Az 



( 1 . 1 ) 



where: 

Zi = 2:5 is a number of teeth of planet wheel 1 or 1’, 

Z 2 = Zfc is a number of teeth (rolls) of stationary gear 2. 

The main element of the Cyclo gear connecting others elements is the 
planet wheel 1 and 1’. Outline of planet wheel (meshing) is a shape of an 
equidistant of shortened epicycloid, (abbreviation ESE) and central gear 2 
consists of set of rolls [2, 4, 8-11]. Open-work shape of planet wheel, complex 
state of load and lack of more precise methods of calculations gives the reasons 
to apply of FEM for design of Cyclo gear. 

In the paper it has been presented a trial of implementation of FEM for 
calculation of loads in meshing and for distribution of stresses and displace- 
ments in high effort points of planetary gear. 



2 Distribution of loads and state of equilibrium for planet wheels 

Torques acting on three shafts of the Cyclo gear must fulfil condition [10, 11], 
fig. 1 and 2: 



Mi-M2+Mh = 0, (2.1) 

where: 

Ml = 2Mc - torque, arising in planet wheels 1 i 1’, 

M 2 - torque giving load on interacting central wheel 2, 

Mh - input torque (driving) on eccentric shaft (yoke shaft) , 

Torques M 2 , Mi = 2Mc and Mh occurring in Cyclo gear produce 3 unknown 
load distributions reacting on planet wheels and other elements: 

- load distribution in meshing, distribution of forces Pi between teeth; 

- load distribution {Qj), acting on bolts of straight-line mechanism ; 

- load distribution of eccentric R on Qri loading roller elements (rolls) in 
bearing hole. 

Figure 2 shows how to balance the forces acting on the planet wheel 1 or 
1’. Forces between teeth Pi and forces Qj are function of displacements Si and 
Sj which arise in points of application of forces. And forces Qri depending on 
resolving of force R are the function of geometrical features of roller bearing 
and mainly depend on radial clearance [4, 5, 12]. 

To calculate forces between teeth Pi and reaction forces Qj there is applied 
analytical method, which has been described by Kudriavcev and Lehmann 
[8, 9]. Analytical method applies a few simplifying assumptions, fig. 2: 

- strains in planet wheel are omitted, wheel is treated as a rigid disk without 
holes; 

- potential strains Si in place of acting of meshing forces Pi result from slight 
angular displacement (3 of planet wheel as rigid plate and strains Sj plate 
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of straight-line mechanism from angle A(p coming from bolts of straight line 
mechanism; 

- eccentric reaction force R, burdening the gear is a concentrated force and 
is not distributed into components Qri and results from conditions of equi- 
librium. 



3 Modelling of meshing of real planet wheels with strainable rolls 
and bolts in nnmerical method 



Assumptions done for numerical method, fig. 2: 

- planet wheel and co-operating elements are strainable; it has been assumed 
linear elastic model of material with E = 2, 08A10® MPa and = 0, 3; 

- displacements Si and Sj in place of acting of forces Pi and Qj result from 
open-work construction of planet wheel and deflections rolls of stationary 
wheel and bolts of straight line mechanism, 

- active gear loading force is eccentric reaction force R resulting from input 
torque Mh and conditions of equilibrium [2-4, 8, 9]; 

- eccentric reaction force R loads gear by setting of pressures Qri of active 
rolls of eccentric bearing. Way of calculating forces Qri in function of radial 
clearance of central bearing has been presented by Chmurawa [3-5], table 1. 

Table 1 Load distribution for n-active rolls in central bearing joint on ex- 
ample of Cyclo gear with ratio i = 19, power TV = 6,4 kW, Mi = 2Mc = 
880 Nm, i? = 10, 3 kN and dm = 76, 5 mm 



No 


Pressure force 
[N] 


Angle of 
applying force 


Angle of load distribution \|/g [°] 


for (n) - active rolls 




ai[“] 


37,68 (3,4) 


48,07 (5) 


57,14 (5) 


61,44 (5,6) 


66,98 (5,6) 


1 


Q,i=Q 


ai==aR=42,5° 


5122 


4468 


3725 


3518 


3325 


2 


Qr2 


a2=18,5° 


2842 


3195 


2952 


2876 


2805 


D 


Qr3 


-5,5° 


0 


6 


894 


1152 


1392 


4 


Q',3 


a’3-90,5° 


0 


6 


894 


1152 


1392 


5 


Q'r2 


a’2=66,5° 


2842 


3195 


2952 


2876 


2805 


6 


Radial clearance 
g [mm] 




0,19 


0,09 


0,045 


0,033 


0,022 



Meshing of toothed wheels with elements of Cyclo gear is characterised 
by coplanar forces and can be modelled in coplanar state of stress [1, 3, 4, 7]. 
Knowing the profile of outside edge (equidistant) and inside edges (circles) it 
has been created the geometrical model of planet wheel as the surface which 
represents real model which has been discretized basing on 8 node surface 
elements 2D. Model of wheel has been divided into 5590 elements and grid 
posses 18526 nodes. There was high density of grid applied particularly near 
edges. It was assured covering of some nodes with characteristic points of 
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Fig. 2 System of forces, distribution of stresses ared in MPa, displacements, torques 
and rule of balancing forces acting on planet wheels 
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meshing of toothed wheels. It concerns contact points of planet wheel teeth 
with rolls and contact points of bolts and bearing rollers in internal holes. 

Meshing of strainable planet wheels 1 and 1’ of nominal meshing with rolls 
of stationary wheel and bolts of straight line mechanism depends on the way 
of taking over loads. That is why it has been considered 8 different, possible 
cases (models) of co-operation external 1’ and internal 1 planet wheels with 
remaining elements of gear [3], fig. 2, 4, 5: 

- model 1: planet wheel made of 2D surface elements , rolls and bolts modelled 
as rigid rod elements; 

- model 2: planet wheel with rolls and bolts modelled by 2D elements; 

- models 3, 4, ..9, 10: planet wheel made of 2D elements, rolls and bolts mod- 
elled as strainable rod elements with different substitute stiffness resulting 
from level of wear coming from long lasting operation. For example model 3 

- rolls and bolts take over load without participation of sleeves (at relatively 
high clearances) and model 9 - rolls and bolts take over load together with 
sleeves. 



4 Results and comparison analyse of results of numerical calcula- 
tions 

FEM calculations were made by MSC Patran/Nastran software for given size 
of Cyclo gear with parameters as table. Results of calculations (for assumed 
above numerical models) includes: 

- distribution of meshing force Pi and reaction force Qj , 

- distribution of reduces stresses Huber-Mises ared in planet wheel, 

- distribution of place displacement Si of meshing. 




Fig. 3 Changes of meshing force value Pi in Cyclo gear for different numerical 
models of co-operating planet wheel 1’ with elements of gear 
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Distribution of meshing force Pi illustrates its change during half-cycle of 
tooth’s load and can be shown as the function of rotational angle of driving 
shaft 7 , fig. 3. In the fig. 3 there are set up proper values calculated 
analytically (symbol a) and numerically (symbols 1 — 10). For half-turn of 
driving shaft there are 3 (but not 1) cycles of change of meshing force and the 
highest value Pi = Pmax is 9, 5 — 33, 5 analytically. Similarly for half-turn of 
the planet wheel there are 2 (but not 1) cycles of change of force Qj and the 
highest value Qj = Qmax is 5, 5 — 43 comparing the values it can be noticed 
high rigidity of rolls and bolts in the gear, diagrams 1 and 9, fig. 3. For 
example the differences of force values Pi for ideally rigid (model 1) and real 
rolls and bolts (model3) are small and are only 0,3— 1,8 

Application of FEM enabled also determination and visualisation of dis- 
tribution of reduced Huber- Mises stresses ared together with distribution of 
forces creating them. For example in the fig. 2, 4 there are shown distribu- 
tion of loads and distribution of stresses (Jred which can occur in planet wheel 
1’ individually for model 9. 




Fig.4 Local distribution of stresses Ored in MPa and forces Pi and Qj in N in the 
most effort fragment of planet wheel 1’ (for model 9) 

To identify regions of occurring maximal stresses Ured there are enlarged 
fragments of planet wheels prepared. The highest stress level occurs in inside 
edge of the bearing Ured = 150 MPa, in outside edge (equidistant) ared = 
140 MPa, between holes ared = 60 — 90 MPa and also at unloaded side 
approx. 0 MPa. 

Analysis of points of contact between planet wheel teeth and co-operating 
wheel rolls needs the values of the points knowing displacements Si (take a 
look at fig. 2). Fig. 5 shows for example distribution of displacement Si of 
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active and passive points of meshing, for model 3 (rolls and bolts without 
sleeves) and model 9 (rolls and bolts with sleeves). 




Fig.5 Distribution of displacement Si of points of mashing of planet wheel with 
co-operating wheel 



5 Conclusions 

• Application of FEM enables calculation and visualisation of unknown 
stress distribution ared and displacement distribution Si practically of 
each point on planet wheel. Calculation of stresses analytically is prac- 
tically impossible. 

• Dominant influence on distribution of meshing forces Pi and reaction 
forces Qj in cycloidal gear is coming from construction and elastic fea- 
tures of planet wheel. In relatively lower degree elastic features of 
interacting elements (rolls and bolts) influence differentiating forces Pi 
and Qj. 

• Distribution of meshing forces Pi, reaction forces Qj and distribution of 
displacement Si calculated with FEM have different traces comparing 
it with distributions determined analytically. It results from omitting 
in analytical method real shape and material features of planet wheels. 

• Curvilinear outside edge and regular inside holes of planet wheels in 
the gear create higher frequency of changes and relatively higher values 
of meshing forces and reaction forces of the bolts of the straight line 
mechanism. Presented results concern cycloidal gear with meshing with 
transmission ratio Si? = 19 and power N = 6,4 kW. 




FEM in Numerical Analysis of Stress and Displacement Distributions 779 



• Presented numerical analysis of loads, stresses and displacements can 
be applied in optimisation of distribution of loads using modification of 
meshing in Cyclo gear. 



References 

1. Chandrupatla T.R., Belegundu A.D. (1991) Introduction to FEM in 
engineering. Prentice Hall, London. 

2. Chmurawa M., Olejek G. (1994) Zaz^bienie cykloidalne przekladni plan- 
etarnej. Zeszyty Naukowe Pol. SI., seria Transport Z. 22/94, Gliwice. 

3. Chmurawa M., John A., Kokot G. (1999) The influence of numerical 
model on distribution of loads and stress in cycloidal planetary gear. 
Proc. 4th International Scientific Colloquium Cax Techniques, Biele- 
feld, Germany. 

4. Chmurawa M. (1999) Distribution of loads in cycloidal planetary gear. 
Proc. International Conference „Mechanics’99”, Kaunas University, Li- 
thuania. 

5. Freda-Krzemihski H. (1985) Lozyska toczne. PWN. Warszawa. 

6. Hamerak K. (1979) Das Cyclogetriebe-eine geniale Idee und ihre tech- 
nische Verwirklichung. Technik Heute. Verlag Christiani, nr 6, Bonn. 

7. Kleiber M. (Ed.) (1998) Handbook of Computational Solid Mechanics. 
Springer Verlag, Berlin-Heidelberg. 

8. Kudriavcev V.N. (1966) Planetarnyje peredaci. Masinostroenije, Moskva 
-Leningrad. 

9. Lehmann M. (1976) Berechnung und Messung der Krafte in einen Zyk- 
loiden-Kurvenscheiben Getriebe. Dissertation. Technische Universitat, 
Miinchen. 

10. Muller H.W. (1971) Die Umlaufgetriebe. Springer Verlag, Berlin. 

11. Muller L. (1983) Przekladnie obiegowe. PWN, Warszawa. 

12. Palmgren A. (1964) Grundlagen der Waltzlagertechnik, Francklische 
Verlagshandlung, Stuttgart. 



This Labour has been made in the frames of KBN, project No 7T07C03815 




Author Index 



Abdallah, H 


1 


Amodio, P 


10 


Ansari, A. R 


18 


Bael, A. Van 


423 


Barel, M. Van 


27 


Barraud, A 


....35, 521 


Barrio, R 


42, 51 


Baryamureeba, V 


59 


Begoha Melendo, M 


586 


Bellavia, S 


68 


Bencheva, G 


76 


Bergamaschi, L 


84 


Bertaccini, D 


93 


Blanes, S 


102 


Bojovic, D 


no 


Braianov, LA 


117 


Brainman, I 


125 


Bujanda, B 


133 


Gai, W 


527 


Gano, B 


144 


Gapizzano, S. S 


152 


Gardoso, J. R 


160 


Garpentieri, B 


170 


Gasas, F 


102 


Gervantes, L 


179 


Ghaitin-Ghatelin, F 


187 


Ghan, R. H 


615 


Gharlier, R 


222 


Ghen, J.-Y 


475 


Ghmurawa, M 


772 


Ghristov, N 


....35, 521 


Glavero, G 


..316, 350 


Goakley, J 


198 


Gollar, A. F 


. . 179, 207 


Gollin, F 


222 


Gondevaux-Lanloy, Gh. . 


214 


D’yakonov, F. G 


273 


Datcheva, M 


222 


Dent, D 


230 


Di Lena, G 


513 


Diderich, G 


238 



Dimitriu, G 246 

Dimov, 1 359, 636 

Dobrev, V 253 

Dooren, P. Van 560 

Duff, I. S 170 

Dunne, R. K 265 

Epelly, 0 214 

Farago, 1 285 

Farrell, P. A 292, 723 

Foschi, P 490 

Fragniere, F 214 

Fuertes, A.-M 198 

Gahan, B 304 

Gaika, A 741 

Gaspar, F 316 

Georgiev, K 325 

Georgieva, G 333 

Grebennikov, A. 1 207 

Giraud, L 170 

Goolin, A. V 341 

Gracia, J. L 350 

Gurov, T 359 

Gustavson, F 333 

Gutierrez, J. M 368 

Hamza, M 1 

Hassanov, V 377 

Hegarty, A. F 18, 292, 723 

Heinig, G 27, 385 

Hemker, P. W 393, 402 

Hernandez, M. A 368 

Hoppe, R. H. W 414 

Hosoda, Y 608 

Houtte, P. Van 423 

Huffel, S. Van 560 

lankov, R 423 

lavernaro, F 513 

Ivanov, 1 377 

Jin, X.-Q 505 

John, A 764, 772 




782 Author Index 



Jorge, J. C 133 

Jovanovic, B. S 431, 439 



Onofri, F 692 

Ostromsky, T 636 



Kandilarov, J. D 431, 451 

Karatson, J 459 

Karaivanova, A 552 

Kaschiev, M 692 

Keer, R. Van 467 

Kincaid, D. R 475 

Kitagawa, T 608 

Kolev, T 482 

Koleva, M 692 

Kontoghiorghes, E. J 490 

Kravanja, P 27 

Kucaba-Pietal, A 230, 498 

Kwan, W. C 615 

Lecheva, A 749 

Lei, S.-L 505 

Leite, F. S 160 

Lesecq, S 35, 521 

Li, Z 527 

Lirkov, 1 535 

Lisbona, F 316, 350 



MacMullen, H 544 

Macconi, M 68 

Margenov, S 482, 535 

Marty, W 238 

Mascagni, M 552 

Mastronardi, N 560 

Matus, P. P 568 

Mazhukin, V. 1 568 

Meini, B 578 

Miller, J. J. H. . . 292, 304, 594, 723 

Mishima, T 628 

Mitrouli, M 602 

Mizoguchi, H 628 

Morini, B 68 

Mozolevsky, I. E 568 

Musgrave, A. P 594 

Nakata, S 608 

Ng, M. K 93, 615 

Nishimura, S 628 



O’Riordan, E. . . . 265, 292, 544, 723 
Oliveros, J. J. 0 207 



Perez, M.-T 198 

Pan, V. Y 644 

Paprzycki, M 230, 535 

Pauletto, G 650 

Pavlov, V 658 

Petcu, D 666 

Petrova, S. 1 414 

Plantie, L 187, 675 

Popivanov, P 684 

Possio, C. T 152 

Radev, St 692 

Ros, J 102 

Schulz, V. H 414 

Sczygiol, N 702 

Shigehara, T 628 

Shishkin, G. I. ... 18, 265, 292, 304, 
393, 544, 594, 710, 723, 756 

Shishkina, L. P 393 

Slavova, A 684 

Slodicka, M 467 

Sousa, E 732 

Sprengel, F 402 

Tadrist, L 692 

Takahashi, D 628 

Telega, J. J 741 

Tokarzewski, S 741 

Toledo, S 125 

Traviesas, E 187 

Tselishcheva, I. V 756 

Tzvetanov, 1 636 



Vassilevski, P 253 

Vulkov, L. G 431, 439, 451 



Wasniewski, J 325 

Whitlock, P 359 



Yalamov, P 51, 333 

Young, D. M 475 

Zadorin, A. 1 451 

Zheleva, 1 749 

Zilli, G 84 

Zlatev, Z 636 




